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GENES ENCODING OLFACTORY RECEPTORS AND BIALLELIC MARKERS 

THEREOF 

FIELD OF THE INVENTION 

The present invention pertains to a purified or isolated nucleic acid comprising ten open 
5 reading Frames (ORFs) encoding ten different olfactory receptor-like proteins, non-coding regions 
flanking the ORFs as well as fragments thereof. The invention also provides recombinant expression 
vectors and recombinant cell hosts containing a nucleic acid encoding said olfactory receptor 
proteins. The invention also concerns the olfactory receptor proteins encoded by these ORFs as well 
as polypeptides that are homologous to said olfactory receptor proteins and the peptide fragments of 
10 both the olfactory receptor proteins and their homologous polypeptide counterparts. The invention 
also deals with antibodies directed specifically against such polypeptides that are useful as 
diagnostic reagents. The invention further encompasses biallelic markers of the olfactory receptor 
gene useful in genetic analysis. The invention also deals with methods and kits for the detection of 
the olfactory receptor proteins and with methods and kits for screening ligand molecules binding to 
15 these proteins. 

BACKGROUND OF THE INVENTION 

Throughout this application, various bibliographic publications are cited. Full bibliographic 
references for these publications may be found at the end of this application, preceding the sequence 
listing and the claims. 

20 OLFACTORY SYSTEM 

The olfactory receptor cells, the first cells in the pathway that give rise to the sense of smell, 
lie in a small patch of membrane, the olfactory epithelium, in the upper part of the nasal cavity. 
These cells are specialized afferent neurons that have an enlarged extension analogous to a dendrite. 
Several long hairlike processes extend out from this extension along the surface of the olfactory 
25 epithelium where they are bathed in mucus. The hairlike processes contain the receptor proteins for 
olfactory stimuli. The axons of these neurons form the olfactory nerve. 

For the detection of an odorous substance which is called an odorant, molecules of the 
substance must first diffuse into the air and pass into the nose to the region of the olfactory 
epithelium. Once there, they dissolve in the mucus that covers the epithelium and then bind to 
30 specific receptor proteins on the cilia. 

Although there are many thousands of olfactory neurons, each contains one, or at most a 
few, of the 1,000 or so different receptor types, each of which responds only to a specific chemically 
related group of odorant molecules. Each odorant has characteristic chemical groups that distinguish 
it from other odorants, and each of these groups activates a different receptor type. Thus the identity 
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of a particular odorant is determined by the activation of a precise combination of receptors, each of 
which is contained in a distinct group of olfactory neurons. 

The axons of the olfactory neurons synapse in the brain structures known as olfactory bulbs, 
which lie on the undersurface of the frontal lobes. Axons from olfactory neurons sharing a common 
5 receptor specificity synapse together on certain olfactory-bulb neurons, thereby maintaining the 
specificity of the original stimuli. 

OLFACTORY RECEPTORS 

In contrast with the immunoglobulin system, the diversity of olfactory receptors is encoded 
by a large germ-line repertoire of olfactory receptor genes. The size of the olfactory receptor gene 

10 family in the human genome is unknown but it has been estimated to encompass 200 to 1 ,000 genes. 

The locations of only a few human genes have been determined to date. The picture, that has 
emerged so far is that several large clusters of olfactory genes and pseudogenes span hundreds of 
kilobases on several chromosomes. Using FISH analyses, more than 25 distinct locations of 
olfactory receptors gene have been identified in the human genome. 

15 In mammals, the olfactory epithelium appears to be organized into distinct topographic 

regions or zones in which expression of a particular receptor gene appears to be restricted to one of 
the four zones in the epithelium. Within the zone, the distribution of neurons expressing a given 
receptor is random. Chromosomal mapping studies have revealed clusters of odorant receptor genes 
at a single locus, and numerous such loci have been mapped to different chromosomes. However, 

20 receptors expressed in the same zone map to different loci, and a single locus can contain genes 
expressed in different zones. A putative odorant receptor promoter, consisting of the 6.7 kb DNA 
fragment upstream of the receptor coding region, has been shown to be sufficient to direct olfactory 
receptor expression in a tissue-specific, zonal-specific manner. 

Olfactory receptors share a seven-transmembrane domain structure (TM1 to TM7) with 

25 many neurotransmitter and hormone receptors. They show a high degree of sequence similarity in 
some conserved domains (TM2 and TM7) as well as regions of diversity (TM3, TM4, TM5, and 
TM6). They are responsible for the recognition and G protein-mediated transduction of odorant 
signals. The genes encoding these receptors are devoid of introns within their coding regions. 

Olfactory receptors display all hallmarks of the G-protein coupled receptor superfamily but 

30 have also some unique motifs. Most notably they appear to be minimal in structure with very short 
cytoplasmic and extracellular loops. In addition, they display a striking structural diversity in the 
third, fourth and fifth transmembrane domains which are supposed to form the hydrophobic core of 
these proteins, and may form the ligand binding site of the receptors. 

An understanding of the genetic basis of olfaction and a knowledge of olfactory receptors 

35 are important to enable the design of fragrance, the identification of compounds which control 
appetite, or the detection of compounds which can be harmful or dangerous. 
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SUMMARY OF THE INVENTION 

This invention provides a nucleic acid molecule encoding ten different olfactory receptor- 
like proteins (OLF). 

The invention also deals with a nucleic acid molecule comprising a nucleotide sequence 
5 encoding an olfactory receptor-like protein, which nucleotide sequence is selected from the group 
consisting of SEQ ID Nos 2-1 1, as well as with the corresponding polypeptide encoded by this 
nucleotide sequence and with antibodies directed against the corresponding polypeptide. 

Oligonucleotide probes or primers hybridizing specifically with an olfactory receptor 
genomic sequence are also part of the present invention, as well as DNA amplification and detection 
10 methods using said primers and probes. 

The invention also concerns a purified and/or isolated biallelic marker located in the 
sequence of the olfactory receptor gene cluster of the invention, wherein said biallelic marker is 
useful as a diagnostic tool in order to detect an allele associated with a specific phenotype as regards 
to the olfaction system, including an alteration of the olfactory perception of substances or 
15 molecules. 

A further object of the invention consists of recombinant vectors comprising any of the 
nucleic acid sequences described above, and in particular of recombinant vectors comprising a 
sequence encoding an olfactory receptor protein, as well as of cell hosts and transgenic non human 
animals comprising said nucleic acid sequences or recombinant vectors. 
20 A further object of the invention consists of methods for screening substances or molecules 

interacting with an olfactory receptor encoded by any of the nucleic acid molecule described above. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 : Alignment of the amino acid sequences of the olfactory polypeptides encoded by 
25 the Open Reading Frames of the olfactory receptor gene cluster of the invention. The lower line 
represents the consensus sequence. The locations of the seven transmembrane domains TM1 to TM7 
are boxed. 

BRIEF DESCRIPTION OF THE SEQUENCES PROVIDED IN THE SEQUENCE 

LISTING 

30 SEQ ID No 1 contains the olfactory receptor genomic sequence. 

SEQ ID Nos 2-1 1 contains the nucleotide sequences of the open reading frame sequences of 
SEQ ID No 1 encoding the OLF1 to OLF10 polypeptides. 

SEQ ID No 12-21 contain the amino acid sequence of OLF1 to OLF 10 polypeptides 
encoded by the open reading frames of SEQ ID Nos 2-11. 
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SEQ ID Nos 22-25 contain the amplification primers used for FISH experiments described 
in Example 1 . 

SEQ ID No 26 contains a primer containing the additional PU 5' sequence described further 
in Example 3. 

5 SEQ ID No 27 contains a primer containing the additional RP 5' sequence described further 

in Example 3. 

In accordance with the regulations relating to Sequence Listings, the following codes have 
been used in the Sequence Listing to indicate the locations of biallelic markers within the sequences 
■and to identify each of the alleles present at the polymorphic base. The code "r" in the sequences 

10 indicates that one allele of the polymorphic base is a guanine, while the other allele is an adenine. 
The code "y" in the sequences indicates that one allele of the polymorphic base is a thymine, while 
the other allele is a cytosine. The code "m" in the sequences indicates that one allele of the * 
polymorphic base is an adenine, while the other allele is an cytosine. The code "k" in the sequences 
indicates that one allele of the polymorphic base is a guanine, while the other allele is a thymine. 

15 The code "s" in the sequences indicates that one allele of the polymorphic base is a guanine, while 
the other allele is a cytosine. The code "w" in the sequences indicates that one allele of the 
polymorphic base is an adenine, while the other allele is an thymine. 

The nucleotide code of the original allele for each biallelic marker is the following: 



Biallelic marker Original allele 

20 99-13670-305 G 

99-13669-471 G 

99-13666-275 A 

99-13664-221 T 

99-13663-218 G 

25 99-13660-277 C 

99-13652-407 G 

99-13652-357 A 

99-13652-308 A 

99-13671-396 A 

30 99-13649-286 C 

99-13648-259 G 

99-13647-278 G 



DETAILED DESCRIPTION OF THE INVENTION 

35 The aim of the present invention is to provide polynucleotides and polypeptides related to 

novel olfactory receptors, notably useful in order to design suitable means for detecting specific 
odorant molecules in a material sample, particularly in a material sample suspected to contain an 
odorant molecule that consists of one of the specific ligands for the olfactory receptors of the 
invention. 
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DEFINITIONS 

Before describing the invention in greater detail, the following definitions are set forth to 
illustrate and define the meaning and scope of the terms used to describe the invention herein. 

General definitions 

5 The terms " olfactory receptor gene " or " OLF1 toOLFlO " genes, when used herein, 

encompasses genomic, mRNA and cDNA sequences encoding the OLF1 to OLF10 olfactory 
receptor proteins. 

The term "heterologous protein ", when used herein, is intended to designate any protein or 
polypeptide other than the OLF1 to OLF 10 proteins. 

10 The term " isolated " requires that the material be removed from its original environment 

(e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring 
polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide 
or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, 
is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide 

15 could be part of a composition, and still be isolated in that the vector or composition is not part of its 
natural environment. 

The term " purified " does not require absolute purity; rather, it is intended as a relative 
definition. Purification of starting material or natural material to at least one order of magnitude, 
preferably two or three orders, and more preferably four or five orders of magnitude is expressly 

20 contemplated. As an example, purification from 0.1 % concentration to 10 % concentration is two 
orders of magnitude. The term "purified polynucleotide'' is used herein to describe a polynucleotide 
or polynucleotide vector of the invention which has been separated from other compounds including, 
but not limited to other nucleic acids, carbohydrates, lipids and proteins (such as the enzymes used 
in the synthesis of the polynucleotide), or the separation of covalently closed polynucleotides from 

25 linear polynucleotides. A polynucleotide is substantially pure when at least about 50%, preferably 
60 to 75% of a sample exhibits a single polynucleotide sequence and conformation (linear versus 
covalently close). A substantially pure polynucleotide typically comprises about 50%, preferably 60 
to 90% weight/weight of a nucleic acid sample, more usually about 95%, and preferably is over 
about 99% pure. Polynucleotide purity or homogeneity is indicated by a number of means well 

30 known in the art, such as agarose or polyacrylamide gel electrophoresis of a sample, followed by 
visualizing a single polynucleotide band upon staining the gel. For certain purposes higher 
resolution can be provided by using HPLC or other means well known in the art. 

The term "polypeptide " refers to a polymer of amino acids without regard to the length of 
the polymer, thus, peptides, oligopeptides, and proteins are included within the definition of 

35 polypeptide. This term also does not specify or exclude post-expression modifications of 

polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, 
acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term 
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polypeptide. Also included within the definition are polypeptides which contain one or more 
analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 
which only occur naturally in an unrelated biological system, modified amino acids from 
mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications 
5 known in the art, both naturally occurring and non-naturally occurring. 

The term " recombinant polypeptide " is used herein to refer to polypeptides that have been 
artificially designed and which comprise at least two polypeptide sequences that are not found as 
contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides 
which have been expressed from a recombinant polynucleotide. 

10 The term " purified polypeptide " is used herein to describe a polypeptide of the invention 

which has been separated from other compounds including, but not limited to nucleic acids, lipids, 
carbohydrates and other proteins. A polypeptide is substantially pure when at least about 50%, 
preferably 60 to 75% of a sample exhibits a single polypeptide sequence. A substantially pure 
polypeptide typically comprises about 50%, preferably 60 to 90% weight/weight of a protein sample, 

15 more usually about 95%, and preferably is over about 99% pure. Polypeptide purity or homogeneity 
is indicated by a number of means well known in the art, such as polyacrylamide gel electrophoresis 
of a sample, followed by visualizing a single polypeptide band upon staining the gel. For certain 
purposes higher resolution can be provided by using HPLC or other means well known in the art. 

As used herein, the term "non-human animal " refers to any non-human vertebrate, birds and 

20 more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, and 
horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is used to 
refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" expressly 
embrace human subjects unless preceded with the term "non-human". 

As used herein, the term " antibody " refers to a polypeptide or group of polypeptides which 

25 are comprised of at least one binding domain, where an antibody binding domain is formed from the 
folding of variable domains of an antibody molecule to form three-dimensional binding spaces with 
an internal surface shape and charge distribution complementary to the features of an antigenic 
determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies 
include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, 

30 Fab', F(ab) 2 , and F(ab') 2 fragments. 

As used herein, an "antigenic determinant " is the portion of an antigen molecule, in this case 
a OLF1 to OLF10 polypeptide, that determines the specificity of the antigen-antibody reaction. An 
"epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 
amino acids in a spatial conformation which is unique to the epitope. Generally an epitope 

35 comprises at least 6 such amino acids, and more usually at least 8-10 such amino acids. Methods for 
determining the amino acids which make up an epitope include x-ray crystallography, 2-dimensional 



WO 00/21985 PCT/IB99/01729 

7 

nuclear magnetic resonance, and epitope mapping e.g. the Pepscan method described by Geysen et 
at. 1984; PCT Publication No. WO 84/03564; and PCT Publication No. WO 84/03506. 

Throughout the present specification, the expression " nucleotide sequence " may be 
employed to designate indifferently a polynucleotide or a nucleic acid. More precisely, the 
5 expression "nucleotide sequence" encompasses the nucleic material itself and is thus not restricted to 
the sequence information (i.e. the succession of letters chosen among the four base letters) that 
biochemically characterizes a specific DNA or RNA molecule. 

As used interchangeably herein, the terms " nucleic acids ", "oligonucleotides", and 
"polynucleotides" include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide 

10 in either single chain or duplex form. The term "nucleotide" as used herein as an adjective to 

describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single- 
' stranded or duplex form. The term "nucleotide" is also used herein as a noun to refer to individual 
nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic 
acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a 

15 phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or 
polynucleotide. The term "nucleotide" is also used herein to encompass "modified nucleotides" 
which comprise at least one modifications (a) an alternative linking group, (b) an analogous form of 
purine, (c) an analogous form of pyrimidine, or (d) an analogous sugar, for examples of analogous 
linking groups, purine, pyrimidines, and sugars see for example PCT publication No. WO 95/04064. 

20 The polynucleotide sequences of the invention may be prepared by any known method, including 
synthetic, recombinant, ex vivo generation, or a combination thereof, as well as utilizing any 
purification methods known in the art. 

A promoter " refers to a DNA sequence recognized by the synthetic machinery of the cell 
required to initiate the specific transcription of a gene. 

25 A sequence which is "operablv linked " to a regulatory sequence such as a promoter means 

that said regulatory element is in the correct location and orientation in relation to the nucleic acid to 
control RNA polymerase initiation and expression of the nucleic acid of interest. As used herein, the 
term "operably linked" refers to a linkage of polynucleotide elements in a functional relationship. 
For instance, a promoter or enhancer is operably linked to a coding sequence if it affects the 

30 transcription of the coding sequence. More precisely, two DNA molecules (such as a polynucleotide 
containing a promoter region and a polynucleotide encoding a desired polypeptide or 
polynucleotide) are said to be "operably linked" if the nature of the linkage between the two 
polynucleotides does not (1) result in the introduction of a frame-shift mutation or (2) interfere with 
the ability of the polynucleotide containing the promoter to direct the transcription of the coding 

35 polynucleotide. 

The term "vector " is used herein to designate either a circular or a linear DNA or RNA 
molecule, which is either double-stranded or single-stranded, and which comprise at least one 
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polynucleotide of interest that is sought to be transferred in a cell host or in a unicellular or 
multicellular host organism. 

The term Primer " denotes a specific oligonucleotide sequence which is complementary to a 
target nucleotide sequence and used to hybridize to the target nucleotide sequence. A primer serves 
5 as an initiation point for nucleotide polymerization catalyzed by either DNA polymerase, RNA 
polymerase or reverse transcriptase. 

The term " probe " denotes a defined nucleic acid segment (or nucleotide analog segment, 
e.g., polynucleotide as defined hereinbelow) which can be used to identify a specific polynucleotide 
sequence present in samples, said nucleic acid segment comprising a nucleotide sequence 
10 complementary of the specific polynucleotide sequence to be identified. 

The terms "trait" and "phenotype" are used interchangeably herein and refer to any visible, 
detectable or otherwise measurable property of an organism such as symptoms of, or susceptibility 
to a disease for example. 

The term "allele " is used herein to refer to variants of a nucleotide sequence. A biallelic 
15 polymorphism has two forms. Diploid organisms may be homozygous or heterozygous for an allelic 
form. 

The term "genotype " as used herein refers the identity of the alleles present in an individual 
or a sample. In the context of the present invention, a genotype preferably refers to the description 
of the biallelic marker alleles present in an individual or a sample. The term "genotyping" a sample 

20 or an individual for a biallelic marker involves determining the specific allele or the specific 
nucleotide carried by an individual at a biallelic marker. 

The term ' "mutation " as used herein refers to a difference in DNA sequence between or 
among different genomes or individuals which has a frequency below 1%. 

The term polymorphism " as used herein refers to the occurrence of two or more alternative 

25 genomic sequences or alleles between or among different genomes or individuals. "Polymorphic" 
refers to the condition in which two or more variants of a specific genomic sequence can be found in 
a population. A "polymorphic site" is the locus at which the variation occurs. A single nucleotide 
polymorphism is the replacement of one nucleotide by another nucleotide at the polymorphic site. 
Deletion of a single nucleotide or insertion of a single nucleotide also gives rise to single nucleotide 

30 polymorphisms. In the context of the present invention, "single nucleotide polymorphism" 
preferably refers to a single nucleotide substitution. Typically, between different individuals, the 
polymorphic site may be occupied by two different nucleotides. 

The term " biallelic polymorphism " and " biallelic marker " are used interchangeably herein to 
refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the 

35 population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker 
site. 
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The location of nucleotides in a polynucleotide with respect to the center of the 
polynucleotide are described herein in the following manner. When a polynucleotide has an odd 
number of nucleotides, the nucleotide at an equal distance from the 3' and 5' ends of the 
polynucleotide is considered to be u at the center " of the polynucleotide, and any nucleotide 
5 immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is . 
considered to be 4t within 1 nucleotide of the center." With an odd number of nucleotides in a 
polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be 
considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even 
number of nucleotides, there would be a bond and not a nucleotide at the center of the 

10 polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 
nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide would 
be considered to be "within 2 nucleotides of the center", and so on. 

Biallelic markers can be defined as genome-derived polynucleotides having between 2 and 
100, preferably between 20, 30, or 40 and 60, and more preferably about 47 nucleotides in length, 

15 which exhibit biallelic polymorphism at one single base position. Each biallelic marker therefore 
corresponds to two forms of a polynucleotide sequence included in a gene which, when compared 
with one another, present a nucleotide modification at one position. 

The term " upstream " is used herein to refer to a location which is toward the 5' end of the 
polynucleotide from a specific reference point. 

20 The terms " base paired " and "Watson & Crick base paired" are used interchangeably herein 

to refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence 
identities in a manner like that found in double-helical DN A with thymine or uracil residues linked 
to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three 
hydrogen bonds (See Stryer, L., Biochemistry, 4 th edition, 1995). 

25 The terms "complementary " or "complement thereof are used herein to refer to the 

sequences of polynucleotides which is capable of forming Watson & Crick base pairing with another 
specified polynucleotide throughout the entirety of the complementary region. For the purpose of the 
present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide 
when each base in the first polynucleotide is paired with its complementary base. Complementary 

30 bases are, generally, A and T (or A and U), or C and G. "Complement" is used herein as a synonym 
from "complementary polynucleotide' 1 , "complementary nucleic acid" and "complementary 
nucleotide sequence". These terms are applied to pairs of polynucleotides based solely upon their 
sequences and not any particular set of conditions under which the two polynucleotides would 
actually bind. 
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1- Polynucleotides 

The invention also relates to variants and fragments of the polynucleotides described herein, 
particularly of an olfactory receptor gene containing one or more biallelic markers according to the 
5 invention. 

Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from a 
reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such as 
a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. Such 
non-naturally occurring variants of the polynucleotide may be made by mutagenesis techniques, 
10 including those applied to polynucleotides, cells or organisms. Generally, differences are limited so 
that the nucleotide sequences of the reference and the variant are closely similar overall and, in many 
regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to, 
nucleotide sequences at least 95% identical to a nucleic acid selected from the group consisting of 

1 5 SEQ ID Nos 1 - 1 1 , or to any polynucleotide fragment of at least 1 2 consecutive nucleotides from a 
nucleic acid selected from the group consisting of SEQ ID Nos 1-1 1, and preferably at least 99% 
identical, more particularly at least 99.5% identical, and most preferably at least 99.8% identical to a 
nucleic acid selected from the group consisting of SEQ ID Nos 1-1 1, or to any polynucleotide 
fragment of at least 12 consecutive nucleotides from a nucleic acid selected from the group 

20 consisting of SEQ ID Nos 1 -1 1 . 

Changes in the nucleotide of a variant may be silent, which means that they do not alter the 
amino acids encoded by the polynucleotide. However, nucleotide changes may also result in amino 
acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the 
reference sequence. The substitutions, deletions or additions may involve one or more nucleotides. 

25 The variants may be altered in coding or non-coding regions or both. Alterations in the coding 
regions may produce conservative or non-conservative amino acid substitutions, deletions or 
additions. 

In the context of the present invention, particularly preferred embodiments are those in 
which the polynucleotides encode polypeptides which retain substantially the same biological 

30 function or activity as the mature olfactory receptor protein, or those in which the polynucleotides 
encode polypeptides which maintain or increase a particular biological activity, while reducing a 
second biological activity. 

A polynucleotide fragment is a polynucleotide which sequence is fully comprised within 
part of a given nucleotide sequence, preferably the nucleotide sequence of an olfact ry receptor gene 

35 f the invention, and variants thereof. The fragment can be a portion of a coding or non-coding 
region of the olfactory receptor gene cluster. Preferably, such fragments comprise at least one of the 
biallelic markers Al to A13 or the complements thereto or a biallelic marker in linkage 
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disequilibrium with one or more of the biallelic markers Al to A13, for which the respective 
locations in the sequence listing are provided in Table 2. 

Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or 
they may be comprised within a single larger polynucleotide of which they form a part or region. 
5 However, several fragments may be comprised within a single larger polynucleotide. 

As representative examples of polynucleotide fragments of the invention, there may be 
mentioned those which have from about 4, 6, 8, 15, 20, 25, 40, 10 to 30, 30 to 55, 50 to 100, 75 to 
100 or 100 to 200 nucleotides in length. Preferred are those fragments having about 47 nucleotides 
in length, such as those comprising at least one of the biallelic markers Al to Al 3 of the olfactory 
10 receptor gene. Optionally, such fragments may consist of, or consist essentially of a contiguous span 
of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 500 or 1000 nucleotides in length. 
A set of preferred fragments contain at least one of the biallelic markers Al to A13 of the olfactory 
receptor gene which are described herein or the complements thereto. 

2- Polypeptides 

15 The invention also relates to variants, fragments, analogs and derivatives of the polypeptides 

described herein, including mutated olfactory receptor proteins. 

The variant may be 1) one in which one or more of the amino acid residues are substituted 
with a conserved or non-conserved amino acid residue and such substituted amino acid residue may 
or may not be one encoded by the genetic code, or 2) one in which one or more of the amino acid 

20 residues includes a substituent group, or 3) one in which the mutated olfactory receptor is fused with 
another compound, such as a compound to increase the half-life of the polypeptide (for example, 
polyethylene glycol), or 4) one in which the additional amino acids are fused to the mutated 
olfactory receptor, such as a leader or secretory sequence or a sequence which is employed for 
purification of the mutated olfactory receptor or a preprotein sequence. Such variants are deemed to 

25 be within the scope of those skilled in the art. 

In the case of an amino acid substitution in the amino acid sequence of a polypeptide 
according to the invention, one or several amino acids can be replaced by "equivalent" amino acids. 
The expression "equivalent" amino acid is used herein to designate any amino acid that may be 
substituted for one of the amino acids having similar properties, such that one skilled in the art of 

30 peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to 
be substantially unchanged. Generally, the following groups of amino acids represent equivalent 
changes: (1) Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) Val, lie, Leu, 
Met,Ala,Phe; (4) Lys, Arg, His; (5) Phe, Tyr, Trp, His. 

More particularly, a variant olfactory receptor polypeptide comprises amino acid changes 

35 ranging from 1, 2, 3, 4, 5, 10 to 20 substitutions, additions or deletions of one aminoacid, preferably 
from 1 to 10, more preferably from 1 to 5 and most preferably from 1 to 3 substitutions, additi ns or 
deletions of one amino acid. The preferred amino acid changes are those which have little or no 
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influence on the biological activity or the capacity of the variant olfactory receptor polypeptide to 
bind to antibodies raised against a native olfactory receptor protein. 

A specific, but not restrictive, embodiment of a modified peptide molecule of interest 
according to the present invention, which consists in a peptide molecule which is resistant to 
5 proteolysis, is a peptide in which the -CONH- peptide bond is modified and replaced by a (CH 2 NH) 
reduced bond, a (NHCO) retro inverso bond, a (CH 2 -0) methylene-oxy bond, a (CH 2 -S) 
thiomethylene bond, a (CH 2 CH 2 ) carba bond, a (CO-CH 2 ) cetomethylene bond, a (CHOH-CH 2 ) 
hydroxyethylene bond), a (N-N) bound, a E-alcene bond or also a -CH=CH- bond. 

The polypeptide according to the invention could have post-translational modifications. For 
10 example, it can present the following modifications: acylation, disulfide bond formation, 
prenylation, carboxymethylation and phosphorylation. 

A polypeptide fragment is a polypeptide which sequence is fully comprised within part of a 
given polypeptide sequence, preferably a polypeptide encoded by an olfactory receptor gene and 
variants thereof. 

15 Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or 

they may be comprised within a single larger polypeptide of which they form a part or region. 
However, several fragments may be comprised within a single larger polypeptide. 

As representative examples of polypeptide fragments of the invention, there may be 
mentioned those which have from about 5, 6, 7, 8, 9 or 10 to 15, 10 to 20, 15 to 40, or 30 to 55 

20 amino acids long. Preferred polypeptide fragments according to the invention comprise a contiguous 
span of at least 6 amino acids, preferably at least 8 or amino acids, more preferably at least 12, 15, 
20, 25, 30, 40, 50, or 100 amino acids of one amino acid sequence. Preferred are those fragments 
containing at least one amino acid mutation in the olfactory receptor protein under consideration. 

Identity between nucleic acids or polypeptides 

25 The terms "percentage of sequence identity" and "percentage homology" are used 

interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are 
determined by comparing two optimally aligned sequences over a comparison window, wherein the 
portion of the polynucleotide or polypeptide sequence in the comparison window may comprise 
additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise 

30 additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by 
determining the number of positions at which the identical nucleic acid base or amino acid residue 
occurs in both sequences to yield the number of matched positions, dividing the number of matched 
positions by the total number of positions in the window of comparison and multiplying the result by 
100 to yield the percentage of sequence identity. Homology is evaluated using either any of the 

35 variety of sequence comparison algorithms and programs known in the art, or by eye inspection. 
Such algorithms and programs include, but are by no means limited to, TBLASTN, BLASTP, 
FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 1988; Altschul et aL, 1990; Thompson 
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etal., 1994; Higgins et al, 1996; Altschul et al., 1990; Altschul et al., 1993). In a particularly 
preferred embodiment, protein and nucleic acid sequence homologies are evaluated using the Basic 
Local Alignment Search Tool ("BLAST') which is well known in the art (see, e.g., Karlin and 
Altschul, 1990; Altschul et al., 1990, 1993, 1997). In particular, five specific BLAST programs are 
5 used to perform the following task: 

(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein 
sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; 

10 (3) BLASTX compares the six-frame conceptual translation products of a query nucleotide 

sequence (both strands) against a protein sequence database; 

(4) TBLASTN compares a query protein sequence against a nucleotide sequence database 
translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence against 
15 the six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, 
which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid 
sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence 
database. High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring 

20 matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 
matrix (Gonnet et al, 1992; Henikoff and HenikofF, 1993). Less preferably, the PAM or PAM250 
matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978). The BLAST programs 
evaluate the statistical significance of all high-scoring segment pairs identified, and preferably 
selects those segments which satisfy a user-specified threshold of significance, such as a user- 

25 specified percent homology. Preferably, the statistical significance of a high-scoring segment pair is 
evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 1990). 
The BLAST programs may be used with the default parameters or with modified parameters 
provided by the user. 

Stringent Hybridization Conditions 

30 By way of example and not limitation, procedures using conditions of high stringency are as 

follows: Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65°C in 
buffer composed of 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 
0.02% BSA, and 500 ng/ml denatured salmon sperm DNA. Filters are hybridized for 48 h at 65°C, 
the preferred hybridization temperature, in prehybridization mixture containing 100 ng/ml denatured 

35 salmon sperm DNA and 5-20 X 1 0 6 cpm of 32 P-labeled probe. Alternatively, the hybridization step 
can be performed at 65°C in the presence of SSC buffer, 1 x SSC corresponding to 0. 1 5M NaCl and 
0.05 M Na citrate. Subsequently, filt r washes can be done at 37°C for 1 h in a solution containing 2 
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x SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA, followed by a wash in 0.1 X SSC at 50°C for 45 
min. Alternatively, filter washes can be performed in a solution containing 2 x SSC and 0.1% SDS, 
or 0.5 x SSC and 0.1% SDS, or 0.1 x SSC and 0.1% SDS at 68°C for 1 5 minute intervals. v. 
Following the wash steps, the hybridized probes are detectable by autoradiography. Other 
5 conditions of high stringency which may be used are well known in the art and as cited in Sambrook 
et al., 1989; and Ausubel et al., 1989. These hybridization conditions are suitable for a nucleic acid 
molecule of about 20 nucleotides in length. There is no need to say that the hybridization conditions 
described above are to be adapted according to the length of the desired nucleic acid, following 
techniques well known to the one skilled in the art. The suitable hybridization conditions may for 
10 example be adapted according to the teachings disclosed in the book of Hames and Higgins (1985) 
or in Sambrook et al.(1989). 

HOMOLOGIES OF THE NOVEL OLFACTORY RECEPTOR GENE WITH 
KNOWN OLFACTORY RECEPTORS 

A comparison analysis of various olfactory receptor amino acid sequences, including the 
1 5 novel sequences of the invention, has been performed with the alignment program Pileup and the 
translation program MAP (Winsconsin Package version 8, GCG). The protein sequences were sorted 
into different families and subfamilies, taking into account their Amino acid Sequence Identity 
(ASI). It was observed the Open Reading Frames of the OLF1 to OLF10 genes are genetically 
clearly distinguished from the already known olfactory receptor sequences. For example, the 
20 olfactory receptor OLF2 presents respectively 39.9 %, 43. 1 % and 44.2 % of identity with prior art 
olfactory receptors referred in Genbankas L35475, U58675_l and Y10530. In addition, the 
nucleotide sequences of Orf-2 to Orf-10 according to the invention are all grouped together, whereas 
the nucleotide Orf-1 of the invention forms a new family by itself. These amino acid sequence 
comparison data clearly indicate that the novel olfactory receptor sequences of the invention share 
25 common genetic characteristics (Orf-2 to Orf-10) or have specific characteristics (Orf-1) that are not 
found in the prior art olfactory receptor sequences. 

A. OLF1 TO OLF10 GENE POLYNUCLEOTIDES. 

The cluster of ten olfactory receptor genes has been found by the inventors to be located on 
the human chromosome 1 1, more precisely within the 1 Iql2-ql3 locus of said chromosome as 
30 described in Example 1. 

1. Genomic sequences of the olfactory receptor gene 

The present invention concerns the genomic sequence of an olfactory receptor cluster. The 
present invention encompasses the olfactory receptor gene, or olfactory receptor genomic sequences 
consisting of, consisting essentially of, or comprising the sequence of SEQ ID No 1, a sequence 
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complementary thereto, as well as fragments and variants thereof. These polynucleotides may be 
purified, isolated, or recombinant. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with 
5 a nucleotide sequence of SEQ ID No 1 or a complementary sequence thereto or a fragment thereof. 
The nucleotide differences as regards to the nucleotide sequence of SEQ ID No 1 may be generally 
randomly distributed throughout the entire nucleic acid. Nevertheless, preferred nucleic acids are 
those wherein the nucleotide differences as regards to the nucleotide sequence of SEQ ED No 1 are 
predominantly located outside the coding sequences contained in the exons. These nucleic acids, as 

10 well as their fragments and variants, may be used as oligonucleotide primers or probes in order to 
detect the presence of a copy of the olfactory receptor gene in a test sample, or alternatively in order 
to amplify a target nucleotide sequence within the olfactory receptor sequences. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic acid 
that hybridizes with the nucleotide sequence of SEQ ID No 1 or a complementary sequence thereto, 

15 under stringent hybridization conditions as defined above. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 

20 positions of SEQ ID No 1: 1-113643, 114064-127488, 127855-144460. Additional preferred nucleic 
acids of the invention include isolated, purified, or recombinant polynucleotides comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1,2,3,5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-10000, 

25 10001-20000, 20001-30000, 30001-40000, 40001-50000, 50001-60000, 60001-70000, 70001- 
80000, 80001-90000, 90001-100000, 100001-110000, 110001-120000, 120001-130000, 130001- 
140000, and 140001-144460. Further preferred nucleic acids of the invention include isolated, 
purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 
30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the 

30 complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 

following nucleotide positions of SEQ ID No 1: 1-5000, 5001-10000, 10001-15000, 15001-20000, 
20001-25000, 25001-30000, 30001-35000, 35001-40000, 40001-45000, 45001-50000, 50001- 
55000, 55001-60000, 60001-65000, 65001-70000, 70001-75000, 75001-80000, 80001-85000, 
85001-90000, 90001-95000, 95001-100000, 100001-105000, 105001-110000, 110001-115000, 

35 115001-120000, 120001-125000, 125001-130000, 130001-135000, 135001-140000, and 140001- 
144460. 
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The olfactory receptor genomic nucleic acid comprises 10 open reading frames, each carried 
by a single exon and encoding a polypeptide designated OLF1 to OLF10. The open reading frames 
positions of OLF1 to OLF10 in SEQ ID No 1 are given as features in the sequence listing and are 
also detailed below in Table A. 
5 Two truncated ubiquitin polypeptides Ubil and Ubi2, unrelated to olfactory receptor coding 

sequences, are encoded on the complementary strand of the olfactory receptor gene. The 
complementary sequence of the Ubil ORF is located between the nucleotide in position 1 14063 and 
the nucleotide in position 1 13644 of the nucleotide sequence of SEQ ID No 1. The complementary 
sequence of the Ubi2 ORF is located between the nucleotide in position 127854 and the nucleotide 
10 in position 127489 of the nucleotide sequence of SEQ ID No 1. 



Table A 



Coding regions 


Non-coding regions 


Name 


Position in SEQ ID No 1 


Name 


Position in SEQ ID No 1 




Beginning 


End 




Beginning 


End 


OLF1 


2406 


2600 


NCI 


1 


2405 


OLF2 


9711 


10658 


NC2 


2601 


9710 


OLF3 


24851 


25369 


NC3 


10659 


24850 


OLF4 


45714 


46661 


NC4 


25370 


45713 


OLF5 


80198 


81115 


NC5 


46662 


80197 


OLF6 


96291 


96902 


NC6 


81116 


96290 


OLF7 


110758 


111564 


NC7 


96903 


1 10757 


OLF8 


122525 


122887 


NC8 


111565 


122524 


OLF9 


132454 


133389 


NC9 


122888 


132453 


OLF10 


143398 


143577 


NC10 


133390 


143397 








NC11 


143578 


144460 



Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising 
a nucleotide sequence selected from the group consisting of the 10 open reading frames of the 
15 olfactory receptor gene, or a sequence complementary thereto. 

The nucleic acid of SEQ ID No 1 also comprises non coding portions flanking each of the 
ten olfactory receptor open reading frames of the sense DNA strand. 

The invention also embodies purified, isolated, or recombinant polynucleotides comprising a 
nucleotide sequence selected from the group consisting of the non-coding regions contained in the 
20 olfactory receptor gene cluster of SEQ ID No 1 , or a sequence complementary thereto as well as 
their fragments or variants. The term "non-coding" sequence refers to any nucleotide sequence 
which does not encode an amino acid. The non-coding sequences encompass upstream and 
downstream regions of the olfactory receptor ORFs of the invention, as well as regions located 
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between two successive olfactory receptor ORFs, as indicated in Table A which lists the 1 1 non- 
coding regions named from NC 1 to NCI 1 . 

The nucleic acids defining the non-coding sequences of the polynucleotide of SEQ ED No 1 
described above, as well as their fragments and variants, may be used as oligonucleotide primers or 
5 probes in order to detect the presence of a copy of one of the olfactory receptor genes of the 

invention in a test sample, or alternatively in order to amplify a target nucleotide sequence within the 
cluster of olfactory receptor encoding sequences according to the invention. 

While this section is entitled "Genomic Sequences of the olfactory receptor gene," it should 
be noted that nucleic acid fragments of any size and sequence may also be comprised by the 
10 polynucleotides described in this section, flanking the genomic sequences of olfactory receptor on 
either side or between two or more such genomic sequences. 

2. Coding regions of the olfactory receptor gene 

The 10 olfactory receptor open reading frames are presented individually as SEQ ID Nos 2- 
1 1 in the appended sequence listing. 

15 Thus, another object of the invention is a purified, isolated, or recombinant nucleic acid 

comprising a nucleotide sequence selected from the group consisting of SEQ ID Nos 2-11, 
complementary sequences thereto, as well as allelic variants, and fragments thereof. Moreover, 
preferred polynucleotides of the invention include purified, isolated, or recombinant olfactory 
receptor cDNAs consisting of, consisting essentially of, or comprising a sequence selected from the 

20 group consisting of SEQ ID Nos 2-11. 

The invention also pertains to a purified or isolated nucleic acid comprising a polynucleotide 
having at least 95% nucleotide identity with a polynucleotide selected from the group consisting of 
SEQ ID Nos 2-1 1, advantageously 99 % nucleotide identity, preferably 99.5% nucleotide identity 
and most preferably 99.8% nucleotide identity with a polynucleotide selected from the group 

25 consisting of SEQ ID Nos 2- 1 1 , or a sequence complementary thereto or a biologically active 
fragment thereof. 

Another object of the invention relates to purified, isolated or recombinant nucleic acids 
comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined 
herein, with a polynucleotide selected from the group consisting of SEQ ID Nos 2-1 1, or a sequence 

30 complementary thereto or a biologically active fragment thereof. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a sequence selected from the group 
consisting of SEQ ID Nos 2-1 1 or the complements thereof. Additional preferred embodiments of 

35 the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous 
span fat least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 
nucleotides of a sequence selected from the group consisting of SEQ ID Nos 2-1 1 r the 
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complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 
following nucleotide positions of said selected sequence : 1-50, 51-100, 101-150, 151-200, 201-250, 
251-300, 301-350, 35M00, 401-450, 451-500, 501-550, 551-600, 601-650, 651-700, 701-750, 751- 
800, 801-850, 851-900, 901- the terminal nucleotide of the olfactory receptor coding regions, to the 
5 extent that such nucleotide positions are consistent with the lengths of the particular olfactory 
receptor coding region being referred to. Further preferred embodiments of the invention include 
isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 
18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a sequence 
selected from the group consisting of SEQ ID Nos 2, 4, 7, 9 and 1 1, or the complements thereof, 

10 wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 

positions of said selected sequence: 1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 151-175, 176- 
200, 201-225, 226-250, 251-275, 276-300, 301-325, 326-350, 351-375, 376-400, 401-425, 426-450, 
451-475, 476-500, 501-525, 526-550, 551-575, 576-the terminal nucleotide of the olfactory receptor 
coding regions, to the extent that such nucleotide positions are consistent with the lengths of the 

1 5 particular olfactory receptor coding region being referred to. 

The present invention also embodies isolated, purified, and recombinant polynucleotides 
encoding olfactory receptor polypeptides, wherein olfactory receptor polypeptides comprise an 
amino acid sequence selected from the group consisting of SEQ ED Nos 1 2-2 1 , a nucleotide 
sequence complementary thereto, a fragment or a variant thereof. The present invention also 

20 embodies isolated, purified, and recombinant polynucleotides which encode polypeptides 

comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a sequence selected from the 
group consisting of SEQ ID Nos 12-21 . In a preferred embodiment, the present invention embodies 
isolated, purified, and recombinant polynucleotides which encode polypeptides comprising a 

25 contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a sequence selected from the group consisting 
of SEQ ID Nos 12-21 wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the following 
amino acid positions in said selected sequence: 1-20,21-40,41-60,61-80,81-100, 101-120, 121- 
140, 141-160, 161-180, 181-200, 201-220, 221-240, 241-260, 261-280, 281-300, 301-the terminal 

30 amino acid of the olfactory receptor proteins, to the extent that such amino acid positions are 
consistent with the lengths of the particular olfactory receptor protein being referred to. In another 
preferred embodiment, the present invention embodies isolated, purified, and recombinant 
polynucleotides which encode polypeptides comprising a contiguous span of at least 6 amino acids, 
preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 

35 amin acids of a sequence selected from the group consisting of SEQ ID Nos 12, 14, 1 7, 1 9 or 2 1 
wherein said contiguous span includes at least 1, 2, 3, 5 or 7 of the following amino acid positions in 
said selected sequence: 1-10, 1 1-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, 101- 
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110, 111-120, 121-130, 131-140, 141-150, 151-160, 161-170, 171-180, 181-190, 191-the terminal 
amino acid of the olfactory receptor proteins, to the extent that such amino acid positions are 
consistent with the lengths of the particular olfactory receptor protein being referred to. 

In further preferred embodiments, the present invention embodies isolated, purified, and 
5 recombinant polynucleotides which encode olfactory receptor polypeptides comprising a contiguous 
span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 
15, 20, 25, 30, 40, 50, or 100 amino acids of a sequence selected from the group consisting of SEQ 
ID No 12-21, wherein said contiguous span includes at least one amino acid at the following 
positions of said selected sequence: 
10 i) 1-3, 10, 16, 21, 28, 33, 34, 36, 42-44, 46, 49, 53, 54, 57, 59, 63, and 64 for SEQ ID 

No 12; 

ii) 2,4, 6, 8, 18, 25,34, 37,44, 52, 56, 80, 83, 89, 98, 101, 102, 113, 114, 117, 120, 

139, 148, 158, 186, 195, 212, 219, 247, 266, 270, 280, 295, 298, 299, 301, 31 1, and 

313-315 for SEQ ID No 13; 
15 iii) 2-4, 6, 18, 21, 25, 34, 37, 98, 99, 102, 1 13, 1 14, 133, 143, 148, 158-163, 166, 167, 

169, and 170 for SEQ ID No 14; 
iv) 2, 4, 6, 8, 18, 25, 34, 37, 44, 52, 54, 56, 80, 83, 89, 98, 101, 102, 1 13, 1 14, 117, 120, 

139, 148, 158, 186, 195, 212, 219, 247, 266, 270, 280, 298, 299, 311, and 313-315 

for SEQ ID No 15; 

20 v) 3, 18, 20, 25, 34, 47, 49, 67, 97, 100, 107, 108, 1 12, 113, 126, 135, 142, 146, 147, 

157, 159-160, 1 94, 196, 228, 245, 264, 265, 269, 279, 298, and 302 for SEQ ID No 
16; 

vi) 2, 6, 18, 20, 33, 34, 37, 65, 68, 69, 72, 86, 88, 101, 107, 1 13, 1 14, 148, 158, 161, 
164, 195, and 198 for SEQ ID No 17; 
25 vii) 2, 6, 7, 52, 56, 67, 88, 94, 97, 1 10, 1 13, 1 16, 1 19, 120, 127, 135, 150, 153, 164, 174, 

175, 180, 184, 217, 221, 259, 261, and 268 for SEQ ID No 18; 

viii) 17, 18,20,28,33,35,49-52, 105, 111, and 112forSEQIDNo 19; 

ix) 17,20,33,35,49-53, 56, 111, 112, 132, 138, 141, 147, 154, 157, 160, 163, 164, 
194, 197, 204, 21 1, 214, 218, 219, 252, 265, 286, 295, 301, 303, 305, 306 and 309 

30 for SEQ ID No 20; and 

x) 9, 1 8, 26-28, 34, 47 and 50 for SEQ ID No 2 1 , to the extent that such amino acid 
lengths are consistent with the lengths of Ihe particular olfactory receptor protein 
being referred to. 

Additional preferred fragments of the nucleotide sequences of SEQ ID Nos 2-1 1 are those 
35 encoding olfactory receptor polypeptide fragments located outside the transmembrane domains of 
the corresponding protein as located in boxes in Figure 1 . 
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The above disclosed polynucleotides that contain only coding sequences derived from the 
olfactory receptor ORFs may be expressed in a desired host cell or a desired host organism, when 
said polynucleotides are placed under the control of suitable expression signals. Such a 
polynucleotide, when placed under suitable expression signals, may be inserted in a vector for its 
5 expression. 

While this section is entitled " Coding regions of the olfactory receptor gene," it should be 
noted that nucleic acid fragments of any size and sequence may also be comprised by the 
polynucleotides described in this section, flanking the genomic sequences of olfactory receptor on 
either side or between two or more such genomic sequences. 

10 3. Polynucleotide Constructs 

The terms "polynucleotide construct" and "recombinant polynucleotide" are used 
interchangeably herein to refer to linear or circular, purified or isolated polynucleotides that have 
been artificially designed and which comprise at least two nucleotide sequences that are not found as 
contiguous nucleotide sequences in their initial natural environment. 

15 DNA Construct That Enables Directing Temporal And Spatial olfactory receptor Gene Expression 
In Recombinant Cell Hosts And In Transgenic Animals. 

In order to study the physiological and phenotypic consequences of a lack of synthesis of the 
olfactory receptor protein, both at the cell level and at the multi cellular organism level, the 
invention also encompasses DNA constructs and recombinant vectors enabling a conditional 

20 expression of a specific allele of the olfactory receptor genomic sequence or cDNA and also of a 
copy of this genomic sequence or cDNA harboring substitutions, deletions, or additions of one or 
more bases as regards to the olfactory receptor nucleotide sequence of SEQ ID Nos 1-1 1, or a 
fragment thereof, these base substitutions, deletions or additions being located in the coding regions 
of the olfactory receptor genomic sequence or within the olfactory receptor open reading frames of 

25 SEQ ID Nos 2-11. In a preferred embodiment, the olfactory receptor sequence comprises a biallelic 
marker of the present invention. In a preferred embodiment, the olfactory receptor sequence 
comprises a biallelic marker of the present invention, preferably one of the biallelic markers Al to 
A13. 

The present invention embodies recombinant vectors comprising any one of the 
30 polynucleotides described in the present invention. More particularly, the polynucleotide constructs 
according to the present invention can comprise any of the polynucleotides described in the 
"Genomic sequences of the olfactory receptor gene" section, the "Coding regions of the olfactory 
receptor Gene" section, and the "Oligonucleotide probes and primers" section. 
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DNA Constructs Allowing Homologous Recombination: Replacement Vectors 

A first preferred DNA construct will comprise, from 5'-end to 3'-end: (a) a first nucleotide 

sequence that is comprised in the olfactory receptor genomic sequence; (b) a nucleotide sequence 

comprising a positive selection marker, such as the marker for neomycine resistance (neo); and (c) a 
5 second nucleotide sequence that is comprised in the olfactory receptor genomic sequence, and is 

located on the genome downstream the first olfactory receptor nucleotide sequence (a). 

In a preferred embodiment, this DNA construct also comprises a negative selection marker 

located upstream the nucleotide sequence (a) or downstream the nucleotide sequence (c). 

Preferably, the negative selection marker comprises the thymidine kinase (tk) gene (Thomas et al., 
1 0 1 986), the hygromycine beta gene (Te Riele et al., 1 990), the hprt gene ( Van der Lugt et al., 1 99 1 ; 

Reid et al., 1990) or the Diphteria toxin A fragment (Dt-A) gene (Nada et al., 1993; Yagi et al.1990). 

Preferably, the positive selection marker is located within an olfactory receptor open reading frame 

sequence so as to interrupt the sequence encoding an olfactory receptor protein. These replacement 

vectors are described, for example, by Thomas et al.(1986; 1987), Mansour et al.(1988) and Koller 
15 etal.(1992). 

The first and second nucleotide sequences (a) and (c) may be indifferently located within an 
olfactory receptor regulatory sequence, an intronic sequence, an exon sequence or a sequence 
containing both regulatory and/or intronic and/or exon sequences. The size of the nucleotide 
sequences (a) and (c) ranges from 1 to 50 kb, preferably from 1 to 10 kb, more preferably from 2 to 
20 6 kb and most preferably from 2 to 4 kb. 

DNA Constructs Allowing Homologous Recombination: Cre-LoxP System. 

These new DNA constructs make use of the site specific recombination system of the PI 
phage. The PI phage possesses a recombinase called Cre which interacts specifically with a 34 base 
pairs lox? site. The loxP site is composed of two palindromic sequences of 13 bp separated by a 8 

25 bp conserved sequence (Hoess et al, 1986). The recombination by the Cre enzyme between two 
loxP sites having an identical orientation leads to the deletion of the DNA fragment. 

The Cre-/oxP system used in combination with a homologous recombination technique has 
been first described by Gu et al.(1993, 1994). Briefly, a nucleotide sequence of interest to be 
inserted in a targeted location of the genome harbors at least two loxP sites in the same orientation 

30 and located at the respective ends of a nucleotide sequence to be excised from the recombinant 
genome. The excision event requires the presence of the recombinase (Cre) enzyme within the 
nucleus of the recombinant cell host. The recombinase enzyme may be brought at the desired time 
either by (a) incubating the recombinant cell hosts in a culture medium containing this enzyme, by 
injecting the Cre enzyme directly into the desired cell, such as described by Araki et al.(1995), or by 

35 lipofection of the enzyme into the cells, such as described by Baubonis et al.( 1 993); (b) transfecting 
the cell host with a vector comprising the Cre coding sequence operably linked to a promoter 
functional in the recombinant cell host, which promoter being optionally inducible, said vector being 
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introduced in the recombinant cell host, such as described by Gu et al.(1993) and Sauer et al.(1988); 
(c) introducing in the genome of the cell host a polynucleotide comprising the Cre coding sequence 
operably linked to a promoter functional in the recombinant cell host, which promoter is optionally 
inducible, and said polynucleotide being inserted in the genome of the cell host either by a random 
5 insertion event or an homologous recombination event, such as described by Gu et a].(1994). 

In a specific embodiment, the vector containing the sequence to be inserted in the olfactory 
receptor gene by homologous recombination is constructed in such a way that selectable markers are 
flanked by loxP sites of the same orientation, it is possible, by treatment by the Cre enzyme, to 
eliminate the selectable markers while leaving the olfactory receptor sequences of interest that have 

10 been inserted by an homologous recombination event. Again, two selectable markers are needed: a 
positive selection marker to select for the recombination event and a negative selection marker to 
select for the homologous recombination event. Vectors and methods using the Cre-/o;cP system are 
described by Zou et al.(1994). 

Thus, a second preferred DNA construct of the invention comprises, from 5'-end to 3'-end: 

15 (a) a first nucleotide sequence that is comprised in the olfactory receptor genomic sequence; (b) a 
nucleotide sequence comprising a polynucleotide encoding a positive selection marker, said 
nucleotide sequence comprising additionally two sequences defining a site recognized by a 
recombinase, such as a loxP site, the two sites being placed in the same orientation; and (c) a second 
nucleotide sequence that is comprised in the olfactory receptor genomic sequence, and is located on 

20 the genome downstream of the first olfactory receptor nucleotide sequence (a). 

The sequences defining a site recognized by a recombinase, such as a loxP site, are 
preferably located within the nucleotide sequence (b) at suitable locations bordering the nucleotide 
sequence for which the conditional excision is sought. In one specific embodiment, two lox? sites 
are located at each side of the positive selection marker sequence, in order to allow its excision at a 

25 desired time after the occurrence of the homologous recombination event. 

In a preferred embodiment of a method using the third DNA construct described above, the 
excision of the polynucleotide fragment bordered by the two sites recognized by a recombinase, 
preferably two loxP sites, is performed at a desired time, due to the presence within the genome of 
the recombinant host cell of a sequence encoding the Cre enzyme operably linked to a promoter 

30 sequence, preferably an inducible promoter, more preferably a tissue-specific promoter sequence and 
most preferably a promoter sequence which is both inducible and tissue-specific, such as described 
byGuetal.(1994). 

The presence of the Cre enzyme within the genome of the recombinant cell host may result 
from the breeding of two transgenic animals, the first transgenic animal bearing the olfactory 
35 receptor-derived sequence of interest containing the loxP sites as described above and the second 
transgenic animal bearing the Cre coding sequence operably linked to a suitable promoter sequence, 
such as described by Gu et al.(l 994). 
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Spatio-temporal control of the Cre enzyme expression may also be achieved with an 
adenovirus based vector that contains the Cre gene thus allowing infection of cells, or in vivo 
infection of organs, for delivery of the Cre enzyme, such as described by Anton and Graham (1 995) 
andKanegae et al.(1995). 
5 The DNA constructs described above may be used to introduce a desired nucleotide 

sequence of the invention, preferably an olfactory receptor genomic sequence or an olfactory 
receptor coding region sequences, and most preferably an altered copy of an olfactory receptor 
genomic or coding region sequences, within a predetermined location of the targeted genome, 
leading either to the generation of an altered copy of a targeted gene (knock-out homologous 

10 recombination) or to the replacement of a copy of the targeted gene by another copy sufficiently 
homologous to allow an homologous recombination event to occur (knock-in homologous 
recombination). In a specific embodiment, the DNA constructs described above may be used to 
introduce an olfactory receptor genomic sequence or an olfactory receptor coding region sequence 
comprising at least one biallelic marker of the present invention, preferably at least one biallelic 

15 marker selected from the group consisting of Al to A13. 

Nuclear Antisense DNA Constructs 

Other compositions containing a vector of the invention comprising an oligonucleotide 
fragment of the nucleic sequence SEQ ID Nos 2-11, preferably a fragment including the start codon 
of the olfactory receptor gene, as an antisense tool that inhibits the expression of the corresponding 

20 olfactory receptor gene. Preferred methods using antisense polynucleotide according to the present 
invention are the procedures described by Sczakiel et al.(1995) or those described in PCT 
Application No WO 95/24223. 

Preferred antisense polynucleotides according to the present invention are complementary to 
a sequence of the mRNAs of olfactory receptor that contains the translation initiation codon ATG. 

25 Preferably, the antisense polynucleotides of the invention have a 3' polyadenylation signal 

that has been replaced with a self-cleaving ribozyme sequence, such that RNA polymerase II 
transcripts are produced without poly(A) at their 3' ends, these antisense polynucleotides being 
incapable of export from the nucleus, such as described by Liu et al.(1994). In a preferred 
embodiment, these olfactory receptor antisense polynucleotides also comprise, within the ribozyme 

30 cassette, a histone stem-loop structure to stabilize cleaved transcripts against 3 '-5' exonucleolytic 
degradation, such as the structure described by Eckner et al.(1991). 

4. Oligonucleotide probes and primers 

Polynucleotides derived from the olfactory receptor gene are useful in order to detect the 
presence of at least a copy of a nucleotide sequence of SEQ ID Nos 1-11, or a fragment, 
35 complement, or variant thereof in a test sample, preferably a human olfactory epithelium tissue or 
isolated human olfactory epithelium cells. 
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Particularly preferred probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides c mprising a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
5 positions of SEQ ID No 1: 1-1 13643, 1 14064-127488, 127855-144460. Additional preferred probes 
and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1,2, 3,5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-10000, 

10 10001-20000, 20001-30000, 30001-40000, 40001-50000, 50001-60000, 60001-70000, 70001- 
80000, 80001-90000, 90001-100000, 100001-110000, 110001-120000, 120001-130000, 130001- 
140000, and 140001-144460. Further preferred probes and primers of the invention include isolated, 
purified, or recombinant polynucleotides comprising a contiguous span of 12, 15, 1 8, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 

15 thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-5000, 5001-10000, 10001-15000, 15001-20000, 20001-25000, 25001- 
30000, 30001-35000, 35001^0000, 40001-45000, 45001-50000, 50001-55000, 55001-60000, 
60001-65000, 65001-70000, 70001-75000, 75001-80000, 80001-85000, 85001-90000, 90001- 
95000, 95001-100000, 100001-105000, 105001-110000, 110001-115000, 115001-120000, 120001- 

20 125000, 125001-130000, 130001-135000, 135001-140000, and 140001-144460. 

Other particularly preferred probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
45 or 50 nucleotides of a sequence selected from the group consisting of SEQ ID Nos 2-1 1 or the 
complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 

25 following nucleotide positions of said selected sequence : 1-50, 51-100, 101-150, 151-200, 201-250, 
251-300, 301-350, 351-400, 401-450, 451-500, 501-550, 551-600, 601-650, 651-700, 701-750, 751- 
800, 801-850, 85 1-900, 901- the terminal nucleotide of the olfactory receptor coding regions, to the 
extent that such nucleotide positions are consistent with the lengths of the particular olfactory 
receptor coding region being referred to. Further preferred probes and primers of the invention 

30 include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 
12, 15, 18, 20, 22 or 25 nucleotides of a sequence selected from the group consisting of SEQ ID Nos 
2, 4, 7, 9 and 1 1, or the complements thereof, wherein said contiguous span comprises at least 1, 2, 
3, 5, or 10 of the following nucleotide positions of said selected sequence: 1-25, 26-50, 51-75, 76- 
100, 101-125, 126-150, 151-175, 176-200, 201-225, 226-250, 251-275, 276-300, 301-325, 326-350, 

35 351-375, 376-400, 401-425, 426450, 451-475, 476-500, 501-525, 526-550, 551-575, 576-the 
terminal nucleotide of the olfactory receptor coding regions, to the extent that such nucleotide 
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positions are consistent with the lengths of the particular olfactory receptor coding region being 
referred to. 

Thus, the invention also relates to nucleic acid probes characterized in that they hybridize 
specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected 
5 from the group consisting of SEQ ID Nos 1-1 1, a variant thereof and a sequence complementary 
thereto. 

In one embodiment the invention encompasses isolated, purified, and recombinant 
polynucleotides consisting of, or consisting essentially of a contiguous span of 8 to 50 nucleotides of 
SEQ ID No 1 and the complement thereof, wherein said span includes an olfactory receptor-related 

10 biallelic marker in said sequence; optionally, wherein said olfactory receptor-related biallelic 

marker is selected from the group consisting of Al to A13, and the complements thereof; optionally, 
wherein said contiguous span is 1 8 to 47 nucleotides in length and said biallelic marker is within 4 
nucleotides of the center of said polynucleotide; optionally, wherein said polynucleotide consists of 
said contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is 

15 at the center of said polynucleotide; optionally, wherein the 3' end of said contiguous span is present 
at the 3* end of said polynucleotide; and optionally, wherein the 3' end of said contiguous span is 
located at the 3' end of said polynucleotide and said biallelic marker is present at the 3' end of said 
polynucleotide. In a preferred embodiment, said probes comprises, consists of, or consists 
essentially of a sequence selected from the following sequences: PI to P13 and the complementary 

20 sequences thereto, for which the respective locations in the sequence listing are provided in Table 3. 
In another embodiment the invention encompasses isolated, purified and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 50 
nucleotides of SEQ ID No 1, or the complements thereof, wherein the 3' end of said contiguous span 
is located at the 3 1 end of said polynucleotide, and wherein the 3* end of said polynucleotide is 

25 located within 20 nucleotides upstream of an olfactory receptor-related biallelic marker in said 
sequence; optionally, wherein said olfactory receptor-related biallelic marker is selected from the 
group consisting of Al to A13, and the complements thereof; optionally, wherein the 3' end of said 
polynucleotide is located 1 nucleotide upstream of said olfactory receptor-related biallelic marker in 
said sequence; and optionally, wherein said polynucleotide consists essentially of a sequence 

30 selected from the following sequences: Dl to D13 and El to E13, for which the respective locations 
in the sequence listing are provided in Table 4. 

In a further embodiment, the invention encompasses isolated, purified, or recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from the 
following sequences: Bl to Bl 1 and CI to CI 1, for which the respective locations in the sequence 

35 listing are provided in Table 1 . 

In an additional embodiment, the invention encompasses polynucleotides for use in 
hybridization assays, sequencing assays, and enzyme-based mismatch detection assays for 
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determining the identity of the nucleotide at an olfactory receptor-related biallelic marker in SEQ ID 
No 1, or the complements thereof, as well as polynucleotides for use in amplifying segments of 
nucleotides comprising an olfactory receptor-related biallelic marker in SEQ ED No 1, or the 
complements thereof; optionally, wherein said olfactory receptor-related biallelic marker is selected 
5 from the group consisting of A 1 to A 13, and the complements thereof. 

A probe or a primer according to the invention has between 8 and 1 000 nucleotides in 
length, or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 
nucleotides in length. More particularly, the length of these probes and primers can range from 8, 
10, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 

10 nucleotides. Shorter probes and primers tend to lack specificity for a target nucleic acid sequence 
and generally require cooler temperatures to form sufficiently stable hybrid complexes with the 
template. Longer probes and primers are expensive to produce and can sometimes self-hybridize to 
form hairpin structures. The appropriate length for primers and probes under a particular set of 
assay conditions may be empirically determined by one of skill in the art. A preferred probe or 

15 primer consists of a nucleic acid comprising a polynucleotide selected from the group of the 

nucleotide sequences of PI to P13 and the complementary sequence thereto, Bl to Bl 1, CI to CI 1, 
Dl toD13,andEltoE13. 

Primers and other oligonucleotides according to the invention are synthesized to be 
"substantially" complementary to a strand of the olfactory receptor gene of the invention to be 

20 amplified. The primer sequence does not need to reflect the exact sequence of the DNA template. 
Minor mismatches can be accommodated by reducing the stringency of the hybridization conditions. 
Among the various methods available to design useful primers, the OSP computer software can be 
used by the skilled person (see Hillier & Green, 1991). All primers contained a common upstream 
oligonucleotide tail enabling the easy systematic sequencing of the resulting amplification 

25 fragments. 

The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. The 
Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C 
content. The higher the G+C content of the primer or probe, the higher is the melting temperature 
because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in 

30 the probes of the invention usually ranges between 10 and 75 %, preferably between 35 and 60 %, 
and more preferably between 40 and 55 %. 

The primers and probes can be prepared by any suitable method, including, for example, 
cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as 
the phosphodiester method of Narang et al.(1979), the phosphodiester method of Brown et al.(1979), 

35 the diethylphosphoramidite method of Beaucage et al.(1981) and the solid support method described 
in EP 0 707 592. 
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Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs 
such as, for example peptide nucleic acids which are disclosed in International Patent Application 
WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 
5,034,506 and 5,142,047. The probe may have to be rendered iw non-extendabIe" in that additional 
5 dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and 
nucleic acid probes can be rendered non-extendable by modifying the 3' end of the probe such that 
the hydroxyl group is no longer capable of participating in elongation. For example, the 3' end of 
the probe can be functionalized with the capture or detection label to thereby consume or otherwise 
block the hydroxyl group. Alternatively, the 3' hydroxyl group simply can be cleaved, replaced or 

10 modified, U.S. Patent Application Serial No. 07/049,061 filed April 19, 1993 describes 
modifications, which can be used to render a probe non-extendable. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
incorporating any label known in the art to be detectable by spectroscopic, photochemical, 
biochemical, immunochemical, or chemical means. For example, useful labels include radioactive 

15 substances (including, 32 P, 35 S, 3 H, 125 I), fluorescent dyes (including, 5-bromodesoxyuridin, 

fluorescein, acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at 
their 3' and 5' ends. Examples of non-radioactive labeling of nucleic acid fragments are described 
in the French patent No. FR-7810975 or by Urdea et al ( 1 988) or Sanchez-Pescador et al (1988). In 
addition, the probes according to the present invention may have structural characteristics such that 

20 they allow the signal amplification, such structural characteristics being, for example, branched 
DNA probes as those described by Urdea et al. in 1 99 1 or in the European patent No. EP 0 225 807 
(Chiron). 

A label can also be used to capture the primer, so as to facilitate the immobilization of either 
the primer or a primer extension product, such as amplified DNA, on a solid support. A capture 

25 label is attached to the primers or probes and can be a specific binding member which forms a 
binding pair with the solid's phase reagent's specific binding member (e.g. biotin and streptavidin). 
Therefore depending upon the type of label carried by a polynucleotide or a probe, it may be 
employed to capture or to detect the target DNA. Further, it will be understood that the 
polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. For 

30 example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, it 
may be selected such that it binds a complementary portion of a primer or probe to thereby 
immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself 
serves as the binding member, those skilled in the art will recognize that the probe will contain a 
sequence or "tail" that is not complementary to the target. In the case where a polynucleotide primer 

35 itself serves as the capture label, at least a portion of the primer will be free to hybridize with a 
nucleic acid on a solid phase. DNA Labeling techniques are well known to the skilled technician. 
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The probes of the present invention are useful for a number of purposes. They can be 
notably used in Southern hybridization to genomic DNA or Northern hybridization to mRNA. The 
probes can also be used to detect PCR amplification products. They may also be used to detect 
mismatches in the OLF1 to OLF10 genes or mRNA using other techniques. Generally, the probes 
5 are complementary to the OLF1 to OLF10 gene coding sequences, although probes complementary 
to non-coding sequences are also contemplated. The probes of the present invention can also be 
useful for genotyping the biallelic markers of the cluster of olfactory receptor genes of the present 
invention. 

Any of the polynucleotides, primers and probes of the present invention can be conveniently 

10 immobilized on a solid support. Solid supports are known to those skilled in the art and include the 
walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, nitrocellulose strips, 
membranes, microparticles such as latex particles, sheep (or other animal) red blood cells, duracytes 
and others. The solid support is not critical and can be selected by one skilled in the art. Thus, latex 
particles, microparticles, magnetic or non-magnetic beads, membranes, plastic tubes, walls of 

1 5 microtiter wells, glass or silicon chips, sheep (or other suitable animal's) red blood cells and 

duracytes are all suitable examples. Suitable methods for immobilizing nucleic acids on solid phases 
include ionic, hydrophobic, covalent interactions and the like. A solid support, as used herein, refers 
to any material which is insoluble, or can be made insoluble by a subsequent reaction. The solid 
support can be chosen for its intrinsic ability to attract and immobilize the capture reagent. 

20 Alternatively, the solid phase can retain an additional receptor which has the ability to attract and 
immobilize the capture reagent. The additional receptor can include a charged substance that is 
oppositely charged with respect to the capture reagent itself or to a charged substance conjugated to 
the capture reagent. As yet another alternative, the receptor molecule can be any specific binding 
member which is immobilized upon (attached to) the solid support and which has the ability to 

25 immobilize the capture reagent through a specific binding reaction. The receptor molecule enables 
the indirect binding of the capture reagent to a solid support material before the performance of the 
assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized 
plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, 
bead, microparticle, chip, sheep (or other suitable animaPs) red blood cells, duracytes® and other 

30 configurations known to those of ordinary skill in the art. The polynucleotides of the invention can 
be attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 
15, 20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, 
polynucleotides other than those of the invention may be attached to the same solid support as one or 
more polynucleotides of the invention. 

35 Consequently, the invention also comprises a method for detecting the presence of a nucleic 

acid comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-1 1, a 
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fragment or a variant thereof and a complementary sequence thereto in a sample, said meth d 
comprising the following steps of: 

a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which can 
hybridize with a nucleotide sequence selected from the group consisting of the nucleotide sequences 

5 of SEQ ID Nos 1 -1 1 , a fragment or a variant thereof and a complementary sequence thereto and the 
sample to be assayed; and 

b) detecting the hybrid complex formed between the probe and a nucleic acid in the sample. 
The invention further concerns a kit for detecting the presence of a nucleic acid comprising a 

nucleotide sequence selected from a group consisting of SEQ ID Nos 1-1 1, a fragment or a variant 
10 thereof and a complementary sequence thereto in a sample, said kit comprising: 

a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize with a 
nucleotide sequence selected from the group consisting of the nucleotide sequences of SEQ ID Nos 
1-1 1, a fragment or a variant thereof and a complementary sequence thereto; and 

b) optionally, the reagents necessary for performing the hybridization reaction. 

15 In a first preferred embodiment of this detection method and kit, said nucleic acid probe or 

the plurality of nucleic acid probes are labeled with a detectable molecule. In a second preferred 
embodiment of said method and kit, said nucleic acid probe or the plurality of nucleic acid probes 
has been immobilized on a substrate. In a third preferred embodiment, the nucleic acid probe or the 
plurality of nucleic acid probes comprise either a sequence which is selected from the group 

20 consisting of the nucleotide sequences of PI to P13 and the complementary sequence thereto, Bl to 
Bl 1, CI to CI 1, Dl to D13, El to E13 or a biallelic marker selected from the group consisting of Al 
to A 13 and the complements thereto. 

Oligonucleotide arrays 

A substrate comprising a plurality of oligonucleotide primers or probes of the invention may 
25 be used either for detecting or amplifying targeted sequences in the olfactory receptor gene and may 
also be used for detecting mutations in the coding or in the non-coding sequences of the olfactory 
receptor gene. 

Any polynucleotide provided herein may be attached in overlapping areas or at random 
locations on the solid support. Alternatively the polynucleotides of the invention may be attached in 

30 an ordered array wherein each polynucleotide is attached to a distinct region of the solid support 
which does not overlap with the attachment site of any other polynucleotide. Preferably, such an 
ordered array of polynucleotides is designed to be "addressable" where the distinct locations are 
recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays 
typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a 

35 substrate in different known locations. The knowledge of the precise location of each 

polynucleotides location makes these "addressable" arrays particularly us ful in hybridization 
assays. Any addressable array technology known in the art can be employed with the 
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polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is 
known as the Genechips™, and has been generally described in US Patent 5,143,854; PCT 
publications WO 90/15070 and 92/10092. These arrays may generally be produced using 
mechanical synthesis methods or light directed synthesis methods which incorporate a combination 
5 of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991). The 
immobilization of arrays of oligonucleotides on solid supports has been rendered possible by the 
development of a technology generally identified as "Very Large Scale Immobilized Polymer 
Synthesis" (VLSIPS™) in which, typically, probes are immobilized in a high density array on a 
solid surface of a chip. Examples of VLSIPS™ technologies are provided in US Patents 5,143,854; 

10 and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95/1 1995, which 
describe methods for forming oligonucleotide arrays through techniques such as light-directed 
synthesis techniques. In designing strategies aimed at providing arrays of nucleotides immobilized 
on solid supports, further presentation strategies were developed to order and display the 
oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence 

1 5 information. Examples of such presentation strategies are disclosed in PCT Publications WO 
94/12305, WO 94/1 1530, WO 97/29212 and WO 97/31256. 

In another embodiment of the oligonucleotide arrays of the invention, an oligonucleotide 
probe matrix may advantageously be used to detect mutations occurring in the olfactory receptor 
gene. For this particular purpose, probes are specifically designed to have a nucleotide sequence 

20 allowing their hybridization to the genes that carry known mutations (either by deletion, insertion or 
substitution of one or several nucleotides). By known mutations, it is meant, mutations on the 
olfactory receptor gene that have been identified according to, for example, the technique used by 
Huang et al.(1996) or Samson et al.(1996). 

Another technique that is used to detect mutations in the olfactory receptor gene is the use of 

25 a high-density DNA array. Each oligonucleotide probe constituting a unit element of the high 
density DNA array is designed to match a specific subsequence of the olfactory receptor genomic 
DNA or cDNA. Thus, an array consisting of oligonucleotides complementary to subsequences of 
the target gene sequence is used to determine the identity of the target sequence with the wild gene 
sequence, measure its amount, and detect differences between the target sequence and the reference 

30 wild gene sequence of the olfactory receptor gene. In one such design, termed 4L tiled array, is 
implemented a set of four probes (A, C, G, T), preferably 15-nucleotide oligomers. In each set of 
four probes, the perfect complement will hybridize more strongly than mismatched probes. 
Consequently, a nucleic acid target of length L is scanned for mutations with a tiled array containing 
4L probes, the whole probe set containing all the possible mutations in the known wild reference 

35 sequence. The hybridization signals of the 1 5-mer probe set tiled array are perturbed by a single 
base change in the target sequence. As a consequence, there is a characteristic loss of signal or a 
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"footprint'* for the probes flanking a mutation position. This technique was described by Chee et al. 
in 1996. 

Consequently, the invention concerns an array of nucleic acid molecules comprising at least 
one polynucleotide described above as probes and primers. Preferably, the invention concerns an 
5 array of nucleic acid comprising at least two polynucleotides described above as probes and primers. 
A further object of the invention consists of an array of nucleic acid sequences comprising 
either at least one of the sequences selected from the group consisting of PI to PI 3, Bl to Bl 1, CI to 
CI 1, Dl to D13, El to E13, the sequences complementary thereto, a fragment thereof of at least 8, 
10, 12, 15, 1 8, 20, 25, 30, or 40 consecutive nucleotides thereof, and at least one sequence 
10 comprising a biallelic marker selected from the group consisting of Al to A13 and the complements 
thereto. 

The invention also pertains to an array of nucleic acid sequences comprising either at least 
two of the sequences selected from the group consisting of PI to PI 3, Bl to Bl 1, CI to CI 1, Dl to 
D13, El to E13, the sequences complementary thereto, a fragment thereof of at least 8 consecutive 
15 nucleotides thereof, and at least two sequences comprising a biallelic marker selected from the group 
consisting of Al to A13 and the complements thereof. 

B. OLF1 TO OFL10 PROTEINS AND POLYPEPTIDE FRAGMENTS 

The proteins encoded by the Open Reading Frames of the OLF1 to OLF10 genes are listed 
individually in the sequence listing as SEQ ID Nos 12-21. 

20 The term "olfactory receptor polypeptides" is used herein to embrace all of the proteins and 

polypeptides of the present invention. Also forming part of the invention are polypeptides encoded 
by the polynucleotides of the invention, as well as fusion polypeptides comprising such 
polypeptides. The invention embodies olfactory receptor proteins from humans, including isolated 
or purified olfactory receptor proteins consisting of, consisting essentially of, or comprising the 

25 sequences of SEQ ID Nos 1 2-2 1 or naturally-occurring variants or fragments thereof. 

The present invention embodies isolated, purified, and recombinant polypeptides comprising 
a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, more preferably 
at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos 12-21. In a preferred 
embodiment, the present invention embodies isolated, purified, and recombinant polypeptides 

30 comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos 12-21 wherein said 
contiguous span includes at least 1, 2, 3, 5 or 10 of the following amino acid positions in SEQ ID 
Nos 12-21: 1-20,21-40,41-60, 61-80, 81-100, 101-120, 121-140, 141-160, 161-180, 181-200, 201- 
220, 221-240, 241-260, 261-280, 281-300, 301-the terminal amino acid of the olfactory receptor 

35 proteins, to the extent that such amino acid positions are consistent with the lengths of the particular 
olfactory receptor protein being referred to. In another preferred embodiment, the present invention 
embodies isolated, purified, and recombinant polypeptides comprising a contiguous span of at least 6 
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amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20. 25, 30, 40, 
50, or 100 amino acids of a sequence selected from the group consisting of SEQ ID Nos 12, 14, 17, 
19 and 21 wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the following amino acid 
positions of said selected sequence: 1-10, 1 1-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 
5 91-100, 101-110, 111-120, 121-130, 131-140, 141-150, 151-160, 161-170, 171-180, 181-190, 191- 
the terminal amino, acid of the olfactory receptor proteins, to the extent that such amino acid 
positions are consistent with the lengths of the particular olfactory receptor protein being referred to. 
In further preferred embodiments, the present invention embodies isolated, purified, and 
recombinant polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 
10 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a 
sequence selected from the group consisting of SEQ ID Nos 12-21, wherein said contiguous span 
includes at least one amino acid at the following positions of said selected sequence 

i) 1-3, 10, 16, 21, 28, 33, 34, 36, 42-44, 46, 49, 53, 54, 57, 59, 63, and 64 for SEQ ID 
No 12; 

15 ii) 2, 4, 6, 8, 18, 25, 34, 37, 44, 52, 56, 80, 83, 89, 98, 101, 102, 1 13, 1 14, 117, 120, 

139, 148, 158, 186, 195, 212, 219, 247, 266, 270, 280, 295, 298, 299, 301, 31 1, and 

313-315 for SEQ ID No 13; 
iii) 2-4, 6, 18, 21, 25, 34, 37, 98, 99, 102, 1 13, 1 14, 133, 143, 148, 158-163, 166, 167, 

1 69, and 170 for SEQ ID No 14; 
20 iv) 2, 4, 6, 8, 18, 25, 34, 37, 44, 52, 54, 56, 80, 83, 89, 98, 101, 102, 1 13, 1 14, 1 17, 120, 

139, 148, 158, 186, 195, 212, 219, 247, 266, 270, 280, 298, 299, 31 1, and 313-315 

for SEQ ID No 15; 

v) 3, 18, 20, 25, 34, 47, 49, 67, 97, 100, 107, 108, 112, 1 13, 126, 135, 142, 146, 147, 
157, 159-160, 194, 196, 228, 245, 264, 265, 269, 279, 298, and 302 for SEQ ID No 

25 16; 

vi) 2, 6, 18, 20, 33, 34, 37, 65, 68, 69, 72, 86, 88, 101, 107, 1 13, 1 14, 148, 158, 161, 
164, 195, and 198 for SEQ ID No 17; 

vii) 2, 6, 7, 52, 56, 67, 88, 94, 97, 110, 113, 116, 119, 120, 127, 135, 150, 153, 164, 174, 
175, 180, 184, 217, 221, 259, 261, and 268 for SEQ ID No 18; 

30 viii) 17, 18, 20, 28, 33, 35, 49-52, 105, 111, and 112 for SEQ ID No 19; 

ix) 17, 20, 33, 35, 49-53, 56, 1 1 1, 112, 132, 138, 141, 147, 154, 157, 160, 163, 164, 
194, 197, 204, 21 1, 214, 218, 219, 252, 265, 286, 295, 301, 303, 305, 306 and 309 
for SEQ ID No 20; and 

x) 9, 1 8, 26-28, 34, 47 and 50 for SEQ ID No 2 1 , to the extent that such amino acid 
35 lengths are consistent with the lengths of the particular olfactory receptor protein 

being referred to. 
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Other preferred OLF1 to OLF 10 polypeptide fragments are those located outside the 
transmembrane domains, most preferably peptide fragments naturally exposed on the cell 
membrane, particularly those that are available for binding to ligand molecules, either odorant 
substances or molecules or antibodies directed to the olfactory receptor polypeptides of the 
5 invention. Such transmembrane domains TM1 to TM7 are boxed in Figure 1. In other preferred 
embodiments the contiguous stretch of amino acids comprises the site of a mutation or functional 
mutation, including a deletion, addition, swap or truncation of the amino acids in the olfactory 
receptor protein sequence. 

The invention also encompasses a purified, isolated, or recombinant polypeptides 
10 comprising an amino acid sequence having at least 70, 75, 80, 85, 90, 95, 98 or 99% amino acid 
identity with the amino acid sequence of SEQ ID Nos 12-2 1 or a fragment thereof. 

The invention also encompasses an olfactory receptor polypeptide or a fragment or a variant 
thereof in which at least one peptide bound has been modified as defined in the "Definitions" 
section. 

15 A further object of the invention concerns a purified or isolated polypeptide which is 

encoded by a nucleic acid comprising a nucleotide sequence selected from the group consisting of 
SEQ ID Nos 1 - 1 1 or fragment or variants thereof. 

Such mutated olfactory receptor proteins may be the target of diagnostic tools, such as 
specific monoclonal or polyclonal antibodies, useful for the detecting the mutated olfactory receptor 
20 proteins in a sample. 

Olfactory receptor proteins are preferably isolated from human or mammalian tissue samples 
or expressed from human or mammalian genes. 

The olfactory receptor polypeptides of the invention is extracted from cells or tissues of 
humans or non-human animals. Methods for purifying proteins are known in the art, and include the 
25 use of detergents or chaotropic agents to disrupt particles followed by differential extraction and 
separation of the polypeptides by ion exchange chromatography, affinity chromatography, 
sedimentation according to density, and gel electrophoresis. 

In addition, shorter protein fragments may also be prepared by the conventional methods of 
chemical synthesis, either in a homogenous solution or in solid phase. As an illustrative embodiment 
30 of such chemical polypeptide synthesis techniques, it may be cited the homogenous solution 
technique described by Houbenweyl in 1 974. For solid phase synthesis the technique described by 
Merrifield (1965) may be used in particular. 

Alternatively, the proteins of the invention can be made using routine expression methods 
known in the art as described below and in the section "Expression of a OLF1 to OLF 10 coding 
35 polynucleotide Briefly, the polynucleotide encoding the desired polypeptide, is ligated into an 
expression vector suitable for any convenient host. Both eukaryotic and prokaryotic host systems is 
used in forming recombinant polypeptides. The polypeptide is then isolated from lysed cells or from 
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the culture medium and purified to the extent needed for its intended use. Purification is by any 
technique known in the art, for example, differential extraction* salt fractionation, chromatography, 
centrifugation, and the like. See, for example, Methods in Enzymology for a variety of methods for 
purifying proteins. 

5 Any olfactory receptor cDNA, including SEQ ED Nos 12-21, may be used to express olfactory 

receptor proteins and polypeptides. The nucleic acid encoding the olfactory receptor protein or 
polypeptide to be expressed is operably linked to a promoter in an expression vector using conventional 
cloning technology. The olfactory receptor insert in the expression vector may comprise the full coding 
sequence for the olfactory receptor protein or a portion thereof. For example, the olfactory receptor 
10 derived insert may encode a polypeptide comprising at least 1 0 consecutive amino acids of the olfactory 
receptor protein of SEQ ID Nos 12-21, including any of the polypeptide fragment defined in this 
section. 

The expression vector is any of the mammalian, yeast, insect or bacterial expression systems 
known in the art. Commercially available vectors and expression systems are available from a variety 

1 5 of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega 
(Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and 
facilitate proper protein folding, the codon context and codon pairing of the sequence is optimized for 
the particular expression organism in which the expression vector is introduced, as explained by 
Hatfield, et al., U.S. Patent No. 5,082,767. 

20 In one embodiment, the entire coding sequence of the olfactory receptor cDNA through the 

poly A signal of the cDNA are operably linked to a promoter in the expression vector. Alternatively, if 
the nucleic acid encoding a portion of the olfactory receptor protein lacks a methionine to serve as the 
initiation site, an initiating methionine can be introduced next to the first codon of the nucleic acid using 
conventional techniques. Similarly, if the insert from the olfactory receptor cDNA lacks a poly A 

25 signal, this sequence can be added to the construct by, for example, splicing out the Poly A signal from 
pSG5 (Stratagene) using Bgll and Sail restriction endonuclease enzymes and incorporating it into the 
mammalian expression vector pXTl (Stratagene). 

The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life 
Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. 

30 Positive transfectants are selected after growing the transfected cells in 600ug/ml G4 1 8 (Sigma, St. 
Louis, Missouri). 

The above procedures may also be used to express a mutant olfactory receptor protein 
responsible for a detectable phenotype or a portion thereof. 

Purification of the recombinant protein or peptide according to the present invention may be 
35 realized by passage onto a Nickel or Copper affinity chromatography column. The Nickel 

chromatography column may contain the Ni-NTA resin (Porath et al., 1975). The polypeptides or 
peptides thus obtained may be purified, for example by high performance liquid chromatography, 
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such as reverse phase and/or cationic exchange HPLC, as described by Rougeot et al. (1994). The 
reason to prefer this kind of peptide or protein purification is the lack of side products found in the 
elution samples which renders the resultant purified protein or peptide more suitable for a 
therapeutic use. 

5 The expressed protein may also be purified using other conventional purification techniques 

such as ammonium sulfate precipitation or chromatographic separation based on size or charge. The 
protein encoded by the nucleic acid insert may also be purified using standard immunochromatography 
techniques. In such procedures, polyclonal or monoclonal antibodies capable of specifically binding to 
the expressed olfactory receptor protein sof SEQ ID Nos 12-2 1 , or a fragment or a variant thereof, have 

10 been previously immobilized onto a chromatography matrix. Such antibodies are described in the 
section "Antibodies that bind olfactory receptor polypeptides" below. Then, a solution containing the 
expressed olfactory receptor protein or portion thereof, such as a cell extract, is applied to the 
chromatography column in conditions allowing the expressed protein to bind to the antibodies in the 
immunochromatography column. Thereafter, the column is washed to remove non-specifically bound 

15 proteins. The specifically bound expressed protein is then released from the column and recovered 
using standard techniques. 

If antibody production is not possible, the nucleic acids encoding the olfactory receptor protein 
or a portion thereof is incorporated into expression vectors designed for use in purification schemes 
employing chimeric polypeptides. In such strategies the nucleic acid encoding the olfactory receptor 

20 protein or a portion thereof is inserted in frame with the gene encoding the other half of the chimera. 
The other half of the chimera is P-globin or a nickel binding polypeptide encoding sequence. A 
chromatography matrix having antibody to P-globin or nickel attached thereto is then used to purify the 
chimeric protein. Protease cleavage sites is engineered between the p-globin gene or the nickel binding 
polypeptide and the olfactory receptor protein or portion thereof. Thus, the two polypeptides of the 

25 chimera is separated from one another by protease digestion. 

One useful expression vector for generating P-globin chimeric proteins is pSG5 (Stratagene), 
which encodes rabbit P-globin. Intron II of the rabbit P-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of 
expression. These techniques are well known to those skilled in the art of molecular biology. Standard 

30 methods are published in methods texts such as Davis et al., ( 1 986) and many of the methods are 
available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be 
produced from the construct using in vitro translation systems such as the In vitro Express™ Translation 
Kit (Stratagene). 

To confirm expression of the olfactory receptor protein or a portion thereof, the proteins 
35 expressed from host cells containing an expression vector containing an insert encoding the olfactory 
receptor protein or a portion thereof can be compared to the proteins expressed in host cells containing 
the expression vector without an insert. The presence of a band in samples from cells containing the 
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expression vector with an insert which is absent in samples from cells containing the expression vector 
without an insert indicates that the olfactory receptor protein or a portion thereof is being expressed. 
Generally, the band will have the mobility expected for the olfactory receptor protein or portion thereof. 
However, the band may have a mobility different than that expected as a result of modifications such as 
5 glycosylation, ubiquitination, or enzymatic cleavage. 

Other suitable techniques for producing and purifying the olfactory receptor proteins of the 
invention or their fragments or variants are also described under the heading "Methods for 
scrreening substances or molecules interacting with an olfactory receptor protein". 

Thus, the present invention also concerns a method for the producing a polypeptide of the 
10 invention, and especially a polypeptide selected from the group of SEQ ID Nos 12-21 or a fragment 
or a variant thereof, wherein said methods comprises the steps of : 

a) culturing, in an appropriate culture medium, a cell host previously transformed or 
transfected with the recombinant vector comprising a nucleic acid encoding an olfactory receptor 
polypeptide of the invention, or a fragment or a variant thereof; 
15 b) harvesting the culture medium thus conditioned or lyze the cell host, for example by 

sonication or by an osmotic shock; 

c) separating or purifying, from the said culture medium, or from the pellet of the resultant 
host cell lysate the thus produced polypeptide of interest. 

d) optionally characterizing the produced polypeptide of interest. 

20 In a specific embodiment of the above method, step a) is preceded by a step wherein the 

nucleic acid coding for an olfactory receptor polypeptide, or a fragment or a variant thereof, is 
inserted in an appropriate vector, optionally after an appropriate cleavage of this amplified nucleic 
acid with one or several restriction endonucleases. The nucleic acid coding for an olfactory receptor 
polypeptide or a fragment or a variant thereof may be the resulting product of an amplification 

25 reaction using a pair of primers according to the invention (by PCR, SDA, TAS, 3SR NASBA, TMA 
etc.). 

C. ANTIBODIES THAT BIND OLFACTORY RECEPTOR POLYPEPTIDES 

Any olfactory receptor polypeptide or whole protein may be used to generate antibodies 
capable of specifically binding to an expressed olfactory receptor protein or fragments thereof as 
30 described. 

One antibody composition of the invention is capable of specifically binding or specifically 
bind to the variant of the olfactory receptor protein of SEQ ID Nos 12-21. For an antibody 
composition to specifically bind to a first variant of olfactory receptor protein, it must demonstrate at 
least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for a first variant of the 
35 olfactory receptor protein than for a second variant of the olfactory receptor protein in an ELISA, 
RIA, or other antibody-based binding assay. 
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In a preferred embodiment, the invention concerns antibody compositions, either polyclonal 
or monoclonal, capable of selectively binding, or that selectively bind to an epitope-containing a 
polypeptide comprising any of the fragments described in the section "OLF1 to OLF10 proteins and 
polypeptide fragments". Preferred peptide fragments are portions of OLFI to OLF10 polypeptides 
5 that are located outside the transmembrane domains, most preferably peptide fragments naturally 
exposed on the cell membrane, particularly those that are available for binding to hgand molecules, 
either odorant substances or molecules or antibodies directed to the olfactory receptor polypeptides 
of the invention. 

The invention also concerns a purified or isolated antibody capable of specifically binding to 

10 a mutated olfactory receptor protein or to a fragment or variant thereof comprising an epitope of the 
mutated olfactory receptor protein. In another preferred embodiment, the present invention concerns 
an antibody capable of binding to a polypeptide comprising at least 10 consecutive amino acids of an 
olfactory receptor protein. 

In a preferred embodiment, the invention concerns the use in the manufacture of antibodies 

15 of a polypeptide comprising any of the fragments described in the section u OLFl to OLFI 0 proteins 
and polypeptide fragments". Preferred peptide fragments are portions of OLF 1 to OLFI 0 
polypeptides that are located outside the transmembrane domains, most preferably peptide fragments 
naturally exposed on the cell membrane, particularly those that are available for recognition of 
Hgand molecules, either odorant substances or molecules or antibodies directed to the olfactory 

20 receptor polypeptides of the invention. 

The olfactory receptor expressed from a DNA comprising at least one of the nucleic 
sequences of SEQ ID Nos 1-1 1 or a fragment or a variant thereof may also be used to generate 
antibodies capable of specifically binding to the expressed olfactory receptor or fragments or 
variants thereof. In a preferred embodiment, any of the polynucleotide fragment encoding a 

25 polypeptide described in the section " Coding regions of the olfactory receptor gene" may be used to 
generate such antibodies. 

Substantially pure protein or polypeptide is isolated from transfected or transformed cells 
containing an expression vector encoding the olfactory receptor protein or a portion thereof. The 
concentration of protein in the final preparation is adjusted, for example, by concentration on an 

30 Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibodies to the 
protein can then be prepared as follows: 

1. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes in the olfactory receptor of the present invention or a portion 
thereof can be prepared from murine hybridomas according to the classical method of Kohler and 
35 Milstein, (1975) or derivative methods thereof. Briefly, a mouse is repetitively inoculated with a few 
micrograms of the considered olfactory receptor or a portion thereof over a period of a few weeks. The 
mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells are 
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fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells 
destroyed by growth of the system on selective media comprising aminopterin (HAT media). The 
successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate 
where growth of the culture is continued. Antibody-producing clones are identified by detection of 
5 antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally 
described by Engvall, (1 980), and derivative methods thereof. Selected positive clones can be expanded 
and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody 
production are described in Davis, L. et al. 

2. Polyclonal Antibody Production by Immunization 

1 0 Polyclonal antiserum containing antibodies to heterogeneous epitopes in the olfactory receptor 

of the present invention or a portion thereof can be prepared by immunizing suitable animals with the 
considered olfactory receptor or a portion thereof, which can be unmodified or modified to enhance 
immunogenicity. A suitable non-human animal, preferably a non-human mammal, is selected, 
usually a mouse, rat, rabbit, goat, or horse. Alternatively, a crude preparation which has been 

15 enriched for olfactory receptor concentration can be used to generate antibodies. Such proteins, 
fragments or preparations are introduced into the non-human mammal in the presence of an 
appropriate adjuvant (e.g. aluminum hydroxide, RIBI, etc.) which is known in the art. In addition 
the protein, fragment or preparation can be pretreated with an agent which will increase antigenicity, 
such agents are known in the art and include, for example, methylated bovine serum albumin 

20 (mBSA), bovine serum albumin (BSA), Hepatitis B surface antigen, and keyhole limpet hemocyanin 
(KLH). Serum from the immunized animal is collected, treated and tested according to known 
procedures. If the serum contains polyclonal antibodies to undesired epitopes, the polyclonal 
antibodies can be purified by immunoaffinity chromatography. 

Effective polyclonal antibody production is affected by many factors related both to the antigen 

25 and the host species. Also, host animals vary in response to site of inoculations and dose, with both 
inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen 
administered at multiple intradermal sites appears to be most reliable. Techniques for producing and 
processing polyclonal antisera are known in the art, see for example, Mayer and Walker (1987). An 
effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. (1971). 

30 Booster injections can be given at regular intervals, and antiserum harvested when antibody titer 

thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against 
known concentrations of the antigen, begins to fall. See, for example, Ouchterlony et al, (1973). 
Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum. Affinity of the 
antisera for the antigen is determined by preparing competitive binding curves, as described, for 

35 example, by Fisher, (1980). 

Antibody preparations prepared according to either the monoclonal or the polyclonal protocol 
are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances 
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in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of 
antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing 
cells expressing the protein or reducing the levels of the protein in the body. 

Non-human animals or mammals, whether wild-type or transgenic, which express a different 
5 species of olfactory receptor than the one to which antibody binding is desired, and animals which 
do not express olfactory receptor (i.e. an olfactory receptor knock out animal as described herein) are 
particularly useful for preparing antibodies. Olfactory receptor knock out animals will recognize all 
or most of the exposed regions of an olfactory receptor protein as foreign antigens, and therefore 
produce antibodies with a wider array of olfactory receptor epitopes. Moreover, smaller 

10 polypeptides with only 10 to 30 amino acids may be useful in obtaining specific binding to any one 
of the olfactory receptor proteins. In addition, the humoral immune system of animals which 
produce a species of olfactory receptor that resembles the antigenic sequence will preferentially 
recognize the differences between the animal's native olfactory receptor species and the antigen 
sequence, and produce antibodies to these unique sites in the antigen sequence. Such a technique 

15 will be particularly useful in obtaining antibodies that specifically bind to any one of the olfactory 
receptor proteins. 

The present invention also includes, chimeric single chain Fv antibody fragments (Martineau et 
al., 1998), antibody fragments obtained through phage display libraries (Ridder et al., 1995; Vaughan et 
al., 1995) and humanized antibodies (Reinmann et al., 1997; Leger et a)., 1997). 
20 The antibodies of the invention may be labeled by any one of the radioactive, fluorescent or 

enzymatic labels known in the art. 

Consequently, the invention is also directed to a method for detecting specifically the 
presence of a polypeptide according to the invention in a biological sample, said method comprising 
the following steps : 

25 a) bringing into contact the biological sample with an antibody according to the 

invention; 

b) detecting the antigen-antibody complex formed. 
Is also part of the invention a diagnostic kit for in vitro detecting the presence of a 
polypeptide according to the present invention in a biological sample, wherein said kit comprises: 
30 a) a polyclonal or monoclonal antibody as described above, optionally labeled; 

b) a reagent allowing the detection of the antigen-antibody complexes formed, said 
reagent carrying optionally a label, or being able to be recognized itself by a labeled reagent, 
more particularly in the case when the above-mentioned monoclonal or polyclonal antibody 
is not labeled by itself. 

35 D. OLFACTORY RECEPTOR-RELATED BIALLELIC MARKERS 

The invention also concerns olfactory receptor-related biallelic markers. As used herein the 
term "olfactory receptor-related biallelic marker" relates to a set of biallelic markers in linkage 



WO 00/21985 PCT/IB99/01729 

40 

disequilibrium with the olfactory receptor gene. The term olfactory receptor-related biallelic marker 
includes the biallelic markers designated Al to A13. 

The biallelic markers of the present invention, namely Al to A13, are disclosed in Table 2 of 
Example 4. The 13 olfactory receptor-related biallelic markers, Al to A13, are all located in the 
5 genomic non coding regions of the olfactory gene cluster of the invention. Their precise location on 
the olfactory receptor genomic sequence and their single base polymorphism are indicated in Table 2 
and also as features in the sequence listing for SEQ ED No 1 . Appropriate pairs of primers allowing 
the amplification of a nucleic acid containing the polymorphic base of the disclosed olfactory 
receptor biallelic marker are also listed in Table 1 of Example 3 and in features of SEQ ID No 1. 
10 In the present invention, the biallelic markers can be defined by nucleotide sequences 

corresponding to oligonucleotides of 47 bases in length comprising at the middle one of the 
polymorphic base. More particularly, the biallelic markers can be defined by the polynucleotides PI 
toP13. 

The biallelic markers contained in the olfactory gene cluster of the present invention, or a 
15 busset of such biallelic markers, are useful tools to perform association studies, preferably to 
perform association studies between the statistically significant occurrence of an allele of said 
biallelic marker in the genome of an individual and a specific phenotype, including a phenotype 
consisting of an alteration of the olfactory perception of odorant substances or molecules by said 
individual. The biallelic markers of the invention can also be used, for example, in linkage analysis 
20 in which evidence is sought for cosegregation between a locus and a putative trait locus using family 
studies, such as an alteration of olfactory perception. In addition, the biallellic markers of the 
invention may be included inthe generation of any complete or partial genetic map of the human 
genome. These different uses are specifically contemplated in the present invention and claims. 

1. Identification of biallelic markers 

25 Any of a variety of methods can be used to screen a genomic fragment for single nucleotide 

polymorphisms such as differential hybridization with oligonucleotide probes, detection of changes 
in the mobility measured by gel electrophoresis or direct sequencing of the amplified nucleic acid. 
A preferred method for identifying biallelic markers involves comparative sequencing of genomic 
DNA fragments from an appropriate number of unrelated individuals. 

30 In a first embodiment, DNA samples from unrelated individuals are pooled together, 

following which the genomic DNA of interest is amplified and sequenced. The nucleotide 
sequences thus obtained are then analyzed to identify significant polymorphisms. One of the major 
advantages of this method resides in the fact that the pooling of the DNA samples substantially 
reduces the number of DNA amplification reactions and sequencing reactions, which must be carried 

35 out Moreover, this method is sufficiently sensitive so that a biallelic marker obtained thereby 
usually shows a sufficient degree of informativeness to be useful in conducting association studies. 
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In a second embodiment, the DNA samples are not pooled and are therefore amplified and 
sequenced individually. This method is usually preferred when biallelic markers need to be 
identified in order to perform association studies within candidate genes. Preferably, highly relevant 
gene regions such as promoter regions or exon regions may be screened for biallelic markers. A 
5 biallelic marker obtained using this method may show a lower degree of informativeness for 

conducting association studies, e.g. if the frequency of its less frequent allele may be less than about 
10%. Such a biallelic marker will, however, be sufficiently informative to conduct association 
studies and it will further be appreciated that including less informative biallelic markers in the 
genetic analysis studies of the present invention, may allow in some cases the direct identification of 
10 causal mutations, which may, depending on their penetrance, be rare mutations. 

The following is a description of the various parameters of a preferred method used by the 
inventors for the identification of the biallelic markers of the present invention. 

Genomic DNA Samples 

The genomic DNA samples from which the biallelic markers of the present invention are 

15 generated are preferably obtained from unrelated individuals corresponding to a heterogeneous 
population of known ethnic background. The number of individuals from whom DNA samples are 
obtained can vary substantially, preferably from about 10 to about 1000, preferably from about 50 to 
about 200 individuals. It is usually preferred to collect DNA samples from at least about 100 
individuals in order to have sufficient polymorphic diversity in a given population to identify as 

20 many markers as possible and to generate statistically significant results. 

As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples, which can 
be tested by the methods of the present invention described herein, and include human and animal 
body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, and 

25 various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, milk, 
white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; fixed 
tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 
aspirates and fixed cell specimens. The preferred source of genomic DNA used in the present 
invention is from peripheral venous blood of each donor. Techniques to prepare genomic DNA 

30 from biological samples are well known to the skilled technician. Details of a preferred embodiment 
are provided in Example 2. The person skilled in the art can choose to amplify pooled or unpooled 
DNA samples. 

DNA Amplification 

The identification of biallelic markers in a sample of genomic DNA may be facilitated 
35 through the use of DNA amplification methods. DNA samples can be pooled or unpooled for the 
amplification step. DNA amplification techniques are well known to those skilled in the art. 
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Amplification techniques that can be used in the context of the present invention include, but 
are not limited to, the ligase chain reaction (LCR) described in EP-A- 320 308, WO 9320227 and 
EP-A-439 182, the polymerase chain reaction (PCR, RT-PCR) and techniques such as the nucleic 
acid sequence based amplification (NASBA) described in Guatelli J.C., et al.(1990) and in Compton 
5 J.( 1 99 1 ), Q-beta amplification as described in European Patent Application No 45446 1 0, strand 
displacement amplification as described in Walker et al.(1996) and EP A 684 315 and, target 
mediated amplification as described in PCT Publication WO 9322461. 

LCR and Gap LCR are exponential amplification techniques, both depend on DNA ligase to 
join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), probe pairs 

10 are used which include two primary (first and second) and two secondary (third and fourth) probes, 
all of which are employed in molar excess to target. The first probe hybridizes to a first segment of 
the target strand and the second probe hybridizes to a second segment of the target strand, the first 
and second segments being contiguous so that the primary probes abut one another in 5 ? phosphate- 
3'hydroxyl relationship, and so that a ligase can covalently fuse or ligate the two probes into a fused 

15 product. In addition, a third (secondary) probe can hybridize to a portion of the first probe and a 
fourth (secondary) probe can hybridize to a portion of the second probe in a similar abutting fashion. 
Of course, if the target is initially double stranded, the secondary probes also will hybridize to the 
target complement in the first instance. Once the ligated strand of primary probes is separated from 
the target strand, it will hybridize with the third and fourth probes, which can be ligated to form a 

20 complementary, secondary ligated product. It is important to realize that the ligated products are 
functionally equivalent to either the target or its complement. By repeated cycles of hybridization 
and ligation, amplification of the target sequence is achieved. A method for multiplex LCR has also 
been described (WO 9320227). Gap LCR (GLCR) is a version of LCR where the probes are not 
adjacent but are separated by 2 to 3 bases. 

25 For amplification of mRNAs, it is within the scope of the present invention to reverse 

transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single 
enzyme for both steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric Gap LCR 
(RT-AGLCR) as described by Marshall et al.(1994). AGLCR is a modification of GLCR that 
allows the amplification of RNA. 

30 The PCR technology is the preferred amplification technique used in the present invention. 

A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR technology, 
see White (1997) and the publication entitled "PCR Methods and Applications" (1991, Cold Spring 
Harbor Laboratory Press). In each of these PCR procedures, PCR primers on either side of the 
nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid sample along 

35 with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, or Vent 
polymerase. The nucleic acid in the sample is denatured and the PCR primers are specifically 
hybridized to complementary nucleic acid sequences in the sample. The hybridized primers are 
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extended. Thereafter, another cycle of denaturation, hybridization, and extension is initiated. The 
cycles are repeated multiple times to produce an amplified fragment containing the nucleic acid 
sequence between the primer sites. PCR has further been described in several patents including US 
Patents 4,683,195; 4,683,202; and 4,965,188. 
5 The PCR technology is the preferred amplification technique used to identify new biallelic 

markers. A typical example of a PCR reaction suitable for the purposes of the present invention is 
provided in Example 3. 

One of the aspects of the present invention is a method for the amplification of the human 
olfactory receptor gene, particularly of a fragment of the genomic sequence of SEQ ID No 1 or of 
10 the coding region sequences of SEQ ID Nos 2-1 1, or a fragment or a variant thereof in a test sample, 
preferably using the PCR technology. This method comprises the steps of: 

a) contacting a test sample with amplification reaction reagents comprising a pair of 
amplification primers as described above and located on either side of the polynucleotide 
region to be amplified, and 
15 b) optionally, detecting the amplification products. 

The invention also concerns a kit for the amplification of an olfactory receptor gene sequence, 
particularly of a portion of the genomic sequence of SEQ ID No 1 or of the coding region sequences 
of SEQ ID Nos 2-1 1 , or a variant thereof in a test sample, wherein said kit comprises: 

a) a pair of oligonucleotide primers located on either side of the olfactory receptor region to 
20 be amplified; 

b) optionally, the reagents necessary for performing the amplification reaction. 

In one embodiment of the above amplification method and kit. the amplification product is 
detected by hybridization with a labeled probe having a sequence which is complementary to the 
amplified region. In another embodiment of the above amplification method and kit, primers 

25 comprise a sequence which is selected from the group consisting of the nucleotide sequences of Bl 
to Bl 1, CI to CI 1, Dl to D13, and El to E13. 

In a first embodiment of the present invention, biallelic markers are identified using genomic 
sequence information generated by the inventors. Sequenced genomic DNA fragments are used to 
design primers for the amplification of 500 bp fragments. These 500 bp fragments are amplified 

30 from genomic DNA and are scanned for biallelic markers. Primers may be designed using the OSP 
software (Hillier L. and Green P., 1991). All primers may contain, upstream of the specific target 
bases, a common oligonucleotide tail that serves as a sequencing primer. Those skilled in the art are 
familiar with primer extensions, which can be used for these purposes. 

Sequencing Of Amplified Genomic DNA And Identification Of Single Nucleotide Polymorphisms 
35 The amplification products generated as described above, are then sequenced using any 

method known and available to the skilled technician. Methods for sequencing DNA using either 
the dideoxy-mediated method (Sanger method) or the Maxam-Gilbert method are widely known to 



WO 00/21985 PCT/IB99/01729 

44 

those of ordinary skill in the art. Such methods are for example disclosed in Sambrook et al.(1989). 
Alternative approaches include hybridization to high-density DNA probe arrays as described in Chee 
etal.(1996). 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
5 reactions using a dye-primer cycle sequencing protocol. Following gel image analysis and DNA 
sequence extraction, sequence data are automatically processed with adequate software to assess 
sequence quality. 

A polymorphism analysis software is used that detects the presence of biallelic sites among 
individual or pooled amplified fragment sequences. Polymorphism search is based on the presence 
1 0 of superimposed peaks in the electrophoresis pattern. These peaks which present distinct colors 
correspond to two different nucleotides at the same position on the sequence. The polymorphism has 
to be detected on both strands for validation. 

Validation Of The Biallelic Markers Of The Present Invention 

The polymorphisms are evaluated for their usefulness as genetic markers by validating that 

15 both alleles are present in a population. Validation of the biallelic markers is accomplished by 
genotyping a group of individuals by a method of the invention and demonstrating that both alleles 
are present. Microsequencing is a preferred method of genotyping alleles. The validation by 
genotyping step may be performed on individual samples derived from each individual in the group 
or by genotyping a pooled sample derived from more than one individual. The group can be as 

20 small as one individual if that individual is heterozygous for the allele in question. Preferably the 
group contains at least three individuals, more preferably the group contains five or six individuals, 
so that a single validation test will be more likely to result in the validation of more of the biallelic 
markers that are being tested. It should be noted, however, that when the validation test is 
performed on a small group it may result in a false negative result if as a result of sampling error 

25 none of the individuals tested carries one of the two alleles. Thus, the validation process is less 
useful in demonstrating that a particular initial result is an artifact, than it is at demonstrating that 
there is a bona fide biallelic marker at a particular position in a sequence. All of the genotyping, 
haplotyping, association, and interaction study methods of the invention may optionally be 
performed solely with validated biallelic markers. 

30 2. Genotyping of biallelic markers 

The polymorphisms identified above can be further confirmed and their respective 
frequencies can be determined through various methods using the previously described primers and 
probes. These methods can also be useful for genotyping either new populations in association 
studies or individuals in the context of detection of alleles of biallelic markers which are known to 
35 be associated with a given trait. Those skilled in the art should note that the methods described 
below can be equally performed on individual or pooled DNA samples. 



WO 00/21985 PCT/IB99/01729 

45 

Once a given polymorphic site has been found and characterized as a biallelic marker as 
described above, several methods can be used in order to determine the specific allele carried by an 
individual at the given polymorphic base. 

The identification of biallelic markers described previously allows the design of appropriate 
5 primers to amplify a region of the olfactory receptor gene cluster containing the polymorphic site of 
interest and for the detection of such polymorphisms. 

Genotyping can be performed using similar methods as those described above for the 
identification of the biallelic markers, or using other genotyping methods such as those further 
described below. In preferred embodiments, the comparison of sequences of amplified genomic 
1 0 fragments from different individuals is used to identify new biallelic markers whereas 

microsequencing is used for genotyping known biallelic markers in diagnostic and genetic analysis 
applications. 

In one embodiment the invention encompasses methods of genotyping comprising 
determining the identity of a nucleotide at an olfactory receptor-related biallelic marker or the 

15 complement thereof in a biological sample; optionally, wherein said olfactory receptor-related 
biallelic marker is selected from the group consisting of Al to A13, and the complements thereof; 
optionally, wherein said biological sample is derived from a single subject; optionally, wherein the 
identity of the nucleotides at said biallelic marker is determined for both copies of said biallelic 
marker present in said individual's genome; optionally, wherein said biological sample is derived 

20 from multiple subjects; Optionally, the genotyping methods of the invention encompass methods 
with any further limitation described in this disclosure, or those following, specified alone or in any 
combination; Optionally, said method is performed in vitro; optionally, further comprising 
amplifying a portion of said sequence comprising the biallelic marker prior to said determining step; 
Optionally, wherein said amplifying is performed by PCR, LCR, or replication of a recombinant 

25 vector comprising an origin of replication and said fragment in a host cell; optionally, wherein said 
determining is performed by a hybridization assay, a sequencing assay, a microsequencing assay, or 
an enzyme-based mismatch detection assay. 

Source of Nucleic Acids for genotyping 

Any source of nucleic acids, in purified or non-purified form, can be utilized as the starting 
30 nucleic acid, provided it contains or is suspected of containing the specific nucleic acid sequence 
desired. DNA or RNA may be extracted from cells, tissues, body fluids and the like as described 
above. While nucleic acids for use in the genotyping methods of the invention can be derived from 
any mammalian source, the test subjects and individuals from which nucleic acid samples are taken 
are generally understood to be human. 
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Amplification Of DNA Fragments Comprising Biallelic Markers 

Methods and polynucleotides are provided to amplify a segment of nucleotides comprising 
one or more biallelic marker of the present invention. It will be appreciated that amplification of 
DNA fragments comprising biallelic markers may be used in various methods and for various 
5 purposes and is not restricted to genotyping. Nevertheless, many genotyping methods, although not 
all, require the previous amplification of the DNA region carrying the biallelic marker of interest. 
Such methods specifically increase the concentration or total number of sequences that span the 
biallelic marker or include that site and sequences located either distal or proximal to it. Diagnostic 
assays may also rely on amplification of DNA segments carrying a biallelic marker of the present 

10 invention. Amplification of DNA may be achieved by any method known in the art. Amplification 
techniques are described above in the section entitled, "DNA amplification." 

Some of these amplification methods are particularly suited for the detection of single 
nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and the 
identification of the polymorphic nucleotide as it is further described below. 

15 The identification of biallelic markers as described above allows the design of appropriate 

oligonucleotides, which can be used as primers to amplify DNA fragments comprising the biallelic 
markers of the present invention. Amplification can be performed using the primers initially used to 
discover new biallelic markers which are described herein or any set of primers allowing the 
amplification of a DNA fragment comprising a biallelic marker of the present invention. 

20 In some embodiments the present invention provides primers for amplifying a DNA 

fragment containing one or more biallelic markers of the present invention. Preferred amplification 
primers are listed in Example 3. It will be appreciated that the primers listed are merely exemplary 
and that any other set of primers which produce amplification products containing one or more 
biallelic markers of the present invention are also of use. 

25 The spacing of the primers determines the length of the segment to be amplified. In the 

context of the present invention, amplified segments carrying biallelic markers can range in size 
from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical, 
fragments from 50-1000 bp are preferred and fragments from 100-600 bp are highly preferred. It 
will be appreciated that amplification primers for the biallelic markers may be any sequence which 

30 allow the specific amplification of any DNA fragment carrying the markers. Amplification primers 
may be labeled or immobilized on a solid support as described in "Oligonucleotide probes and 
primers". 

Methods of Genotyping DNA samples for Biallelic Markers 

Any method known in the art can be used to identify the nucleotide present at a biallelic 
35 marker site. Since the biallelic marker allele to be detected has been identified and specified in the 
present invention, detection will prove simple for one of ordinary skill in the art by employing any 
of a number of techniques. Many genotyping methods require the previous amplification of the 
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DNA region carrying the biallelic marker of interest. While the amplification of target or signal is 
often preferred at present, ultrasensitive detection methods which do not require amplification are 
also encompassed by the present genotyping methods. Methods well-known to those skilled in the 
art that can be used to detect biallelic polymorphisms include methods such as, conventional dot blot 
5 analyzes, single strand conformational polymorphism analysis (SSCP) described by Orita et 
al.(1989), denaturing gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch 
cleavage detection, and other conventional techniques as described in Sheffield et al.(1991), White 
et al.(1992), Grompe et al.(l 989 and 1993). Another method for determining the identity of the 
nucleotide present at a particular polymorphic site employs a specialized exonuclease-resistant 

10 nucleotide derivative as described in US patent 4,656,127. 

Preferred methods involve directly determining the identity of the nucleotide present at a 
biallelic marker site by sequencing assay, enzyme-based mismatch detection assay, or hybridization 
assay. The following is a description of some preferred methods. A highly preferred method is the 
microsequencing technique. The term "sequencing" is generally used herein to refer to polymerase 

15 extension of duplex primer/template complexes and includes both traditional sequencing and 
microsequencing. 
1) Sequencing Assays 

The nucleotide present at a polymorphic site can be determined by sequencing methods. In 
a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as 

20 described above. DNA sequencing methods are described in "Sequencing Of Amplified Genomic 
DNA And Identification Of Single Nucleotide Polymorphisms". 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. Sequence analysis allows the identification 
of the base present at the biallelic marker site. 

25 2) Microsequencing Assays 

In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is 
detected by a single nucleotide primer extension reaction. This method involves appropriate 
microsequencing primers which, hybridize just upstream of the polymorphic base of interest in the 
target nucleic acid. A polymerase is used to specifically extend the 3' end of the primer with one 

30 single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the 
identity of the incorporated nucleotide is determined in any suitable way. 

Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the 
extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing 
machines to determine the identity of the incorporated nucleotide as described in EP 412 883. 

35 Alternatively capillary electrophoresis can be used in order to process a higher number of assays 
simultaneously. An example of a typical microsequencing procedure that can be used in the context 
of the present invention is provided in Example 5. 
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Different approaches can be used for the labeling and detection of ddNTPs. A homogeneous 
phase detection method based on fluorescence resonance energy transfer has been described by Chen 
and Kwok (1997) and Chen et al.(1997). In this method, amplified genomic DNA fragments 
containing polymorphic sites are incubated with a 5'-fluorescein-labeled primer in the presence of 
5 allelic dye-labeled dideoxyribonucleoside triphosphates and a modified Taq polymerase. The dye- 
labeled primer is extended one base by the dye-terminator specific for the allele present on the 
template. At the end of the genotyping reaction, the fluorescence intensities of the two dyes in the 
reaction mixture are analyzed directly without separation or purification. All these steps can be 
performed in the same tube and the fluorescence changes can be monitored in real time. 
1 0 Alternatively, the extended primer may be analyzed by M ALDI-TOF Mass Spectrometry. The base 
at the polymorphic site is identified by the mass added onto the microsequencing primer (see Haff 
and Smirnov, 1 997). 

Microsequencing may be achieved by the established microsequencing method or by 
developments or derivatives thereof. Alternative methods include several solid-phase 

15 microsequencing techniques. The basic microsequencing protocol is the same as described 

previously, except that the method is conducted as a heterogeneous phase assay, in which the primer 
or the target molecule is immobilized or captured onto a solid support. To simplify the primer 
separation and the terminal nucleotide addition analysis, oligonucleotides are attached to solid 
supports or are modified in such ways that permit affinity separation as well as polymerase 

20 extension. The 5' ends and internal nucleotides of synthetic oligonucleotides can be modified in a 
number of different ways to permit different affinity separation approaches, e.g., biotinylation. If a 
single affinity group is used on the oligonucleotides, the oligonucleotides can be separated from the 
incorporated terminator regent. This eliminates the need of physical or size separation. More than 
one oligonucleotide can be separated from the terminator reagent and analyzed simultaneously if 

25 more than one affinity group is used. This permits the analysis of several nucleic acid species or 
more nucleic acid sequence information per extension reaction. The affinity group need not be on 
the priming oligonucleotide but could alternatively be present on the template. For example, 
immobilization can be carried out via an interaction between biotinylated DNA and streptavidin- 
coated microtitration wells or avidin-coated polystyrene particles. In the same manner, 

30 oligonucleotides or templates may be attached to a solid support in a high-density format. In such 
solid phase microsequencing reactions, incorporated ddNTPs can be radiolabeled (Syvanen, 1994) 
or linked to fluorescein (Livak and Hainer, 1994). The detection of radiolabeled ddNTPs can be 
achieved through scintillation-based techniques. The detection of fluorescein-linked ddNTPs can be 
based on the binding of antifluorescein antibody conjugated with alkaline phosphatase, followed by 

35 incubation with a chromogenic substrate (such as />-nitrophenyl phosphate). Other possible reporter- 
detection pairs include: ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase 
conjugate (Harju et al., 1993) or biotinylated ddNTP and horseradish peroxidase-conjugated 
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streptavidin with o-phenylenediamine as a substrate (WO 92/15712). As yet another alternative 
solid-phase microsequencing procedure, Nyren et al.(1993) described a method relying on the 
detection of DNA polymerase activity by an enzymatic luminometric inorganic pyrophosphate 
detection assay (ELIDA). 
5 Pastinen et al.( 1 997) describe a method for multiplex detection of single nucleotide 

polymorphism in which the solid phase minisequencing principle is applied to an oligonucleotide 
array format. High-density arrays of DNA probes attached to a solid support (DNA chips) are c 
further described below. 

In one aspect the present invention provides polynucleotides and methods to genotype one or 

10 more biallelic markers of the present invention by performing a microsequencing assay. Preferred 
microsequencing primers include the nucleotide sequences Dl to Dn and El to En. It will be 
appreciated that the microsequencing primers listed in Example 5 are merely exemplary and that, 
any primer having a 3' end immediately adjacent to the polymorphic nucleotide may be used. 
Similarly, it will be appreciated that microsequencing analysis may be performed for any biallelic 

1 5 marker or any combination of biallelic markers of the present invention. One aspect of the present 
invention is a solid support which includes one or more microsequencing primers listed in Example 
5, or fragments comprising at least 8, 12, 15, 20, 25, 30, 40, or 50 consecutive nucleotides thereof, to 
the extent that such lengths are consistent with the primer described, and having a 3' terminus 
immediately upstream of the corresponding biallelic marker, for determining the identity of a 

20 nucleotide at a biallelic marker site. 

3) Mismatch detection assays based on polymerases and ligases 

In one aspect the present invention provides polynucleotides and methods to determine the 
allele of one or more biallelic markers of the present invention in a biological sample, by mismatch 
detection assays based on polymerases and/or ligases. These assays are based on the specificity of 

25 polymerases and ligases. Polymerization reactions places particularly stringent requirements on 
correct base pairing of the 3' end of the amplification primer and the joining of two oligonucleotides 
hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, 
especially at the 3' end. Methods, primers and various parameters to amplify DNA fragments 
comprising biallelic markers of the present invention are further described above in "Amplification 

30 Of DNA Fragments Comprising Biallelic Markers". 

Allele Specific Amplification Primers 
Discrimination between the two alleles of a biallelic marker can also be achieved by allele 
specific amplification, a selective strategy, whereby one of the alleles is amplified without 
amplification of the other allele. For allele specific amplification, at least one member of the pair of 

35 primers is sufficiently complementary with a region of an olfactory receptor gene comprising the 
polymorphic base of a biallelic marker of the present invention to hybridize therewith and to initiate 
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the amplification. Such primers are able to discriminate between the two alleles of a biallelic 
marker. 

This is accomplished by placing the polymorphic base at the 3' end of one of the 
amplification primers. Because the extension forms from the 3'end of the primer, a mismatch at or 
5 near this position has an inhibitory effect on amplification. Therefore, under appropriate 
amplification conditions, these primers only direct amplification on their complementary allele. 
Determining the precise location of the mismatch and the corresponding assay conditions are well 
within the ordinary skill in the art. 

Ligation/ Amplification Based Methods 

10 The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are designed 

to be capable of hybridizing to abutting sequences of a single strand of a target molecules. One of 
the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise 
complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that 
their termini abut, and create a ligation substrate that can be captured and detected. OLA is capable 

15 of detecting single nucleotide polymorphisms and may be advantageously combined with PCR as 
described by Nickerson et al.(1990). In this method, PCR is used to achieve the exponential 
amplification of target DNA, which is then detected using OLA. 

Other amplification methods which are particularly suited for the detection of single 
nucleotide polymorphism include LCR (ligase chain reaction), Gap LCR (GLCR) which are 

20 described above in "DNA Amplification". LCR uses two pairs of probes to exponentially amplify a 
specific target. The sequences of each pair of oligonucleotides, is selected to permit the pair to 
hybridize to abutting sequences of the same strand of the target. Such hybridization forms a 
substrate for a template-dependant ligase. In accordance with the present invention, LCR can be 
performed with oligonucleotides having the proximal and distal sequences of the same strand of a 

25 biallelic marker site. In one embodiment, either oligonucleotide will be designed to include the 
biallelic marker site. In such an embodiment, the reaction conditions are selected such that the 
oligonucleotides can be ligated together only if the target molecule either contains or lacks the 
specific nucleotide that is complementary to the biallelic marker on the oligonucleotide. Li an 
alternative embodiment, the oligonucleotides will not include the biallelic marker, such that when 

30 they hybridize to the target molecule, a "gap" is created as described in WO 90/01 069. This gap is 
then "filled" with complementary dNTPs (as mediated by DNA polymerase), or by an additional 
pair of oligonucleotides. Thus at the end of each cycle, each single strand has a complement capable 
of serving as a target during the next cycle and exponential allele-specific amplification of the 
desired sequence is obtained. 

35 Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method for determining the 

identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This method 
involves the incorporation of a nucleoside triphosphate that is complementary to the nucleotide 
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present at the preselected site onto the terminus of a primer molecule, and their subsequent ligation 
to a second oligonucleotide. The reaction is monitored by detecting a specific label attached to the 
reaction's solid phase or by detection in solution. 
4) Hybridization Assay Meth ds 
5 A preferred method of determining the identity of the nucleotide present at a biallelic marker 

site involves nucleic acid hybridization. The hybridization probes, which can be conveniently used 
in such reactions, preferably include the probes defined herein. Any hybridization assay may be 
used including Southern hybridization, Northern hybridization, dot blot hybridization and solid- 
phase hybridization (see Sambrook et al., 1 989). 

10 Hybridization refers to the formation of a duplex structure by two single stranded nucleic 

acids due to complementary base pairing. Hybridization can occur between exactly complementary 
nucleic acid strands or between nucleic acid strands that contain minor regions of mismatch. 
Specific probes can be designed that hybridize to one form of a biallelic marker and not to the other 
and therefore are able to discriminate between different allelic forms. Allele-specific probes are 

15 often used in pairs, one member of a pair showing perfect match to a target sequence containing the 
original allele and the other showing a perfect match to the target sequence containing the alternative 
allele. Hybridization conditions should be sufficiently stringent that there is a significant difference 
in hybridization intensity between alleles, and preferably an essentially binary response, whereby a 
probe hybridizes to only one of the alleles. Stringent, sequence specific hybridization conditions, 

20 under which a probe will hybridize only to the exactly complementary target sequence are well 
known in the art (Sambrook et al., 1989). Stringent conditions are sequence dependent and will be 
different in different circumstances. Generally, stringent conditions are selected to be about 5°C 
lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and 
pH. Although such hybridization can be performed in solution, it is preferred to employ a solid- 

25 phase hybridization assay. The target DNA comprising a biallelic marker of the present invention 
may be amplified prior to the hybridization reaction. The presence of a specific allele in the sample 
is determined by detecting the presence or the absence of stable hybrid duplexes formed between the 
probe and the target DNA. The detection of hybrid duplexes can be carried out by a number of 
methods. Various detection assay formats are well known which utilize detectable labels bound to 

30 either the target or the probe to enable detection of the hybrid duplexes. Typically, hybridization 
duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then 
detected. Those skilled in the art will recognize that wash steps may be employed to wash away 
excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay 
formats are suitable for detecting the hybrids using the labels present on the primers and probes. 

35 Two recently developed assays allow hybridization-based allele discrimination with no need 

for separations or washes (see Landegren U. et al., 1998). The TaqMan assay takes advantage of 
the 5' nuclease activity of Taq DNA polymerase to digest a DNA probe annealed specifically to the 
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accumulating amplification product. TaqMan probes are labeled with a donor-acceptor dye pair that 
interacts via fluorescence energy transfer. Cleavage of the TaqMan probe by the advancing 
polymerase during amplification diss ciates the donor dye from the quenching acceptor dye, greatly 
increasing the donor fluorescence. All reagents necessary to detect two allelic variants can be 
5 assembled at the beginning of the reaction and the results are monitored in real time (see Livak et al., 
1995). In an alternative homogeneous hybridization based procedure, molecular beacons are used 
for allele discriminations. Molecular beacons are hairpin-shaped oligonucleotide probes that report 
the presence of specific nucleic acids in homogeneous solutions. When they bind to their targets 
they undergo a conformational reorganization that restores the fluorescence of an internally 

10 quenched fluorophore (Tyagi et al., 1998). 

The polynucleotides provided herein can be used to produce probes which can be used in 
hybridization assays for the detection of biallelic marker alleles in biological samples. These probes 
are characterized in that they preferably comprise between 8 and 50 nucleotides, and in that they are 
sufficiently complementary to a sequence comprising a biallelic marker of the present invention to 

15 hybridize thereto and preferably sufficiently specific to be able to discriminate the targeted sequence 
for only one nucleotide variation. A particularly preferred probe is 25 nucleotides in length. 
Preferably the biallelic marker is within 4 nucleotides of the center of the polynucleotide probe. In 
particularly preferred probes, the biallelic marker is at the center of said polynucleotide. Preferred 
probes comprise a nucleotide sequence selected from the group consisting of amplicons listed in 

20 Table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. Preferred probes comprise a nucleotide 
sequence selected from the group consisting of PI to P13 and the sequences complementary thereto. 
In preferred embodiments the polymorphic base(s) are within 5, 4, 3, 2, 1, nucleotides of the center 

25 of the said polynucleotide, more preferably at the center of said polynucleotide. 

Preferably the probes of the present invention are labeled or immobilized on a solid support. 
Labels and solid supports are further described in "Oligonucleotide Probes and Primers". The 
probes can be non-extendable as described in "Oligonucleotide Probes and Primers". 

By assaying the hybridization to an allele specific probe, one can detect the presence or 

30 absence of a biallelic marker allele in a given sample. High-Throughput parallel hybridization in 
array format is specifically encompassed within "hybridization assays" and are described below. 
5) Hybridization To Addressable Arrays Of Oligonucleotides 

Hybridization assays based on oligonucleotide arrays rely on the differences in hybridization 
stability of short oligonucleotides to perfectly matched and mismatched target sequence variants. 

35 Efficient access to polymorphism information is obtained through a basic structure comprising high- 
density arrays of oligonucleotide probes attached to a solid support (e.g., the chip) at selected 
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positions. Each DNA chip can contain thousands to millions of individual synthetic DNA probes 
arranged in a grid-like pattern and miniaturized to the size of a dime. 

The chip technology has already been applied with success in numerous cases. For example, 
the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae mutant strains, 
5 and in the protease gene of HIV-1 virus (Hacia et al., 1 996; Shoemaker et a!., 1996; Kozal et aL, 
1996). Chips of various formats for use in detecting biallelic polymorphisms can be produced on a 
customized basis by Affymetrix (GeneChip™), Hyseq (HyChip and HyGnostics), and Protogene 
Laboratories. 

In general, these methods employ arrays of oligonucleotide probes that are complementary 

10 to target nucleic acid sequence segments from an individual which, target sequences include a 
polymorphic marker. EP 785280 describes a tiling strategy for the detection of single nucleotide 
polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific 
polymorphisms. By "tiling" is generally meant the synthesis of a defined set of oligonucleotide 
probes which is made up of a sequence complementary to the target sequence of interest, as well as 

15 preselected variations of that sequence, e.g., substitution of one or more given positions with one or 
more members of the basis set of nucleotides. Tiling strategies are further described in PCT 
application No. WO 95/1 1995. In a particular aspect, arrays are tiled for a number of specific, 
identified biallelic marker sequences. In particular, the array is tiled to include a number of 
detection blocks, each detection block being specific for a specific biallelic marker or a set of 

20 biallelic markers. For example, a detection block may be tiled to include a number of probes, which 
span the sequence segment that includes a specific polymorphism. To ensure probes that are 
complementary to each allele, the probes are synthesized in pairs differing at the biallelic marker. In 
addition to the probes differing at the polymorphic base, monosubstituted probes are also generally 
tiled within the detection block. These monosubstituted probes have bases at and up to a certain 

25 number of bases in either direction from the polymorphism, substituted with the remaining 

nucleotides (selected from A, T, G, C and U). Typically the probes in a tiled detection block will 
include substitutions of the sequence positions up to and including those that are 5 bases away from 
the biallelic marker. The monosubstituted probes provide internal controls for the tiled array, to 
distinguish actual hybridization from artefactual cross-hybridization. Upon completion of 

30 hybridization with the target sequence and washing of the array, the array is scanned to determine 
the position on the array to which the target sequence hybridizes. The hybridization data from the 
scanned array is then analyzed to identify which allele or alleles of the biallelic marker are present in 
the sample. Hybridization and scanning may be carried out as described in PCT application No. WO 
92/10092 and WO 95/11995 and US patent No. 5,424,186. 

35 Thus, in some embodiments, the chips may comprise an array f nucleic acid sequences of 

fragments of about 15 nucleotides in length. In further embodiments, the chip may comprise an 
array including at least one of the sequences selected from the group consisting of amplicons listed 
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in table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. In preferred embodiments the 
polymorphic base is within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, more 
5 preferably at the center of said polynucleotide. In some embodiments, the chip may comprise an 
array of at least 2, 3, 4, 5, 6, 7, 8 or more of these polynucleotides of the invention. Solid supports 
and polynucleotides of the present invention attached to solid supports are further described in 
"Oligonucleotide Probes And Primers". 
6) Integrated Systems 

10 Another technique, which may be used to analyze polymorphisms, includes multicomponent 

integrated systems, which miniaturize and compartmentalize processes such as PCR and capillary 
electrophoresis reactions in a single functional device. An example of such technique is disclosed in. 
US patent 5,589,136 which describes the integration of PCR amplification and capillary 
electrophoresis in chips. 

15 Integrated systems can be envisaged mainly when microfluidic systems are used. These 

systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer 
included on a microchip. The movements of the samples are controlled by electric, electroosmotic ., 
or hydrostatic forces applied across different areas of the microchip to create functional microscopic 
valves and pumps with no moving parts. 

20 For genotyping biallelic markers, the microfluidic system may integrate nucleic acid 

amplification, microsequencing, capillary electrophoresis and a detection method such as laser- 
induced fluorescence detection. 

E. EXPRESSION OF AN OL1 TO OLF10 CODING POLYNUCLEOTIDE 

Any of the coding polynucleotides of the invention may be inserted into recombinant vectors 
25 for expression in a recombinant host cell or a recombinant host organism. 

Thus, the present invention also encompasses a family of recombinant vectors that contains 
a coding polynucleotide from the group of coding polynucleotides OLF1 to OLF10 genes. 
Consequently, the present invention further deals with a recombinant vector comprising a 
polynucleotide comprising any of the coding sequence of SEQ ID No 1, preferably those selected 
30 from the group consisting of SEQ ID Nos 2-11. * 

In a first preferred embodiment, the present invention relates to expression vectors which 
include nucleic acids encoding an olfactory receptor protein described herein under the control of an 
exogenous regulatory sequence. 

In a second preferred embodiment, a recombinant vector of the invention is used to amplify 
35 the inserted p lynucleotide derived from an olfactory receptor genomic sequence selected from the 
group consisting of the nucleic acids of SEQ ID No 1 and of olfactory receptor cDNAs, for example 
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the open reading frames of SEQ ID Nos 2-11, in a suitable cell host , this polynucleotide being 
amplified at every time that the recombinant vector replicates. 

More particularly, the present invention relates to expression vectors which include nucleic 
acids encoding an olfactory receptor protein, preferably the olfactory receptor proteins of the amino 
5 acid sequence of SEQ ID Nos 12-21 or variants or fragments thereof, under the control of an 
exogenous regulatory sequence. 

Generally, a recombinant vector of the invention may comprise any of the polynucleotides 
described herein, including regulatory sequences, and coding sequences, as well as any olfactory 
receptor primer or probe as defined above. More particularly, the recombinant vectors of the present 
10 invention can comprise any of the polynucleotides described in the "Coding Regions of the olfactory 

receptor gene" section, "Genomic sequence of the olfactory receptor gene" section, the 
v "Oligonucleotide Probes And Primers" section and the "Polynucleotide constructs" section. 

Some of the elements which can be found in the vectors of the present invention are 
described in further detail in the following sections. 

15 Vectors 

A recombinant vector according to the invention comprises, but is not limited to, a YAC 
(Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid, a 
cosmid, a plasmid or even a linear DNA molecule which may consist of a chromosomal, non- 
chromosomal and synthetic DNA. Such a recombinant vector can comprise a transcriptional unit 
20 comprising an assembly of 

(1) a genetic element or elements having a regulatory role in gene expression, for 
example promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 
10 to 300 bp that act on the promoter to increase the transcription. 

(2) a structural or coding sequence which is transcribed into mRNA and eventually 
25 translated into a polypeptide, and 

(3) appropriate transcription initiation and termination sequences. Structural units 
intended for use in yeast or eukaryotic expression systems preferably include a leader sequence 
enabling extracellular secretion of translated protein by a host cell. Alternatively, where 
recombinant protein is expressed without a leader or transport sequence, it may include an N- 

30 terminal residue. This residue may or may not be subsequently cleaved from the expressed 
recombinant protein to provide a final product. 

Generally, recombinant expression vectors will include origins of replication, selectable 
markers permitting transformation of the host cell, and a promoter derived from a highly expressed 
gene to direct transcription of a downstream structural sequence. The heterologous structural 
35 sequence is assembled in appropriate phase with translation initiation and termination sequences, 
and preferably a leader sequence capable of directing secretion of translated protein into the 
periplasms space or extracellular medium. 
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The selectable marker genes for selection of transformed host cells are preferably 
dihydro folate reductase or neomycin resistance for eukaryotic cell culture, TRP1 for S. cerevisiae or 
tetracycline, rifampicine or ampicillin resistance in E. coli, or levan saccharase for mycobacteria. 

For facilitating the purification of the expressed protein and increasing its stability, the 
5 coding sequence of an olfactory receptor according to the invention can be fused in its N- or C- 
terminus with protein such as MBP (maltose binding protein) and GST (Glutathione S transferase) 
or with tag such as poly-histidine tag, Strep tag, Bio tag, and flag peptide epitope tag, those being 
detailed below. Thioredoxin can be eventually inserted between the olfactory receptor and the tag. 

Useful expression vectors for bacterial use are constructed by inserting a structural DNA 
10 sequence encoding a desired polypeptide with suitable translation initiation and termination signals 
in operable reading phase with a functional promoter. The vector will comprise one or more 
phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and 
to, if desirable, provide amplification within the host. 

As a representative but non-limiting example, useful expression vectors for bacterial use can 
1 5 comprise a selectable marker and bacterial origin of replication derived from commercially available 
plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial vectors include, 
for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEM1 (Promega Biotec, Madison, WI, 
USA). 

Large numbers of suitable vectors and promoters are known to those of skill in the art, and 
20 commercially available, such as bacterial vectors : pQE70, pQE60, pQE-9 (Qiagen), pbs, pDIO, 
phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A (Stratagene); 
ptrc99a, pKK223-3, pKK233-3, pDR540, pRTT5 (Pharmacia): or eukaryotic vectors : pWLNEO, 
pSV2CAT, pOG44, pXTl, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); 
baculovirus transfer vector pVL1392/1393 (Pharmingen); pQE-30 (QIAexpress). 
25 A suitable vector for the expression of the olfactory receptor above-defined or their peptide 

fragments is baculovirus vector that can be propagated in insect cells and in insect cell lines. A 
specific suitable host vector system is the pVL1392/1393 baculovirus transfer vector (Pharmingen) 
that is used to transfect the SF9 cell line (ATCC N°CRL 1711) which is derived from Spodoptera 
fivgiperda. 

30 Other suitable vectors for the expression of an olfactory receptor or their peptide fragments 

or variants in a baculovirus expression system include those described by Chai et al. (1993), Vlasak 

et al. (1983) and Lenhard et al. (1996). 

Mammalian expression vectors will comprise an origin of replication, a suitable promoter 

and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and 
35 acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences. 

DNA sequences derived from the SV40 viral genome, for example SV40 origin, early promoter, 
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enhancer, splice and polyadenylation sites may be used to provide the required nontranscribed 
genetic elements. 

Promoters 

The suitable promoter regions used in the expression vectors according to the present 
5 invention are chosen taking into account of the cell host in which the heterologous gene has to be 
expressed. 

A suitable promoter may be heterologous with respect to the nucleic acid for which it 
controls the expression or alternatively can be endogenous to the native polynucleotide containing 
the coding sequence to be expressed. Additionally, the promoter is generally heterologous with 
1 0 respect to the recombinant vector sequences within which the construct promoter/coding sequence 
has been inserted. 

Thus, the promoter is selected among the group comprising : 

- an internal or an endogenous promoter, such as the natural promoter associated 
with the structural gene coding for the desired olfactory receptor polypeptide or the fragment or 

1 5 variant thereof; such a promoter may be completed by a regulatory element derived from the 
vertebrate host, in particular an activator element; 

- a promoter derived from a cytoskeletal protein gene such as the desmin promoter 
(Bolmont et al., 1990; Zhenlin et al., 1989) or a promoter derived from a gene specifically expressed 
in epithelial cells and most preferably in olfactory epithelial cells. 

20 Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA 

polymerase promoters, the polyhedrin promoter, or the plO protein promoter from baculovirus (Kit 
Novagen) (Smith et al., 1983.; O'Reilly et al., 1992), the lambda P R promoter or also the trc 
promoter. 

Promoter regions can be selected from any desired gene using, for example, CAT 
25 (chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 

Particularly preferred bacterial promoters include lacl, lacZ, T3, T7, gpt, lambda PR, PL and trp. 

Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, 

LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and promoter 

is well within the level of ordinary skill in the art. 
30 The choice of a determined promoter, among the above-described promoters is well in the 

ability of one skill in the art, guided by his knowledge in the genetic engineering technical field, and 

by being also guided by the book of Sambrook et al. in 1 989 or also by the procedures described by 

Fuller et al. in 1996 (Fuller S.A. et al, 1996). 

A preferred constitutive promoter that is used is one of the internal promoters that are active 
35 in the resting fibroblasts such the promoter of the phosphoglycerate kinase gene (PGK-1). The PGK- 

1 promoter is either the mouse promoter or the human promoter such as described by Adra et al.( 
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1987). Other constitutive promoters may also be used such that the beta-actin promoter (Kort et al., 
1 983) or the vi mentin promoter (Rettlez and Basenga, 1 987). 

The vector containing the appropriate DNA sequence as described above, more preferably a 
OLF1 to OLF10 coding polynucleotide, can be utilized to transform an appropriate host to allow the 
5 expression of the desired polypeptide or polynucleotide. 

Other types of vectors 

The in vivo expression of an olfactory receptor polypeptide encompassed by the invention or 
a fragment or a variant thereof may be useful in order to correct a genetic defect related to the 
expression of the native gene in a host organism or to the production of biologically active olfactory 
10 receptor proteins. 

Consequently, the present invention also deals with recombinant expression vectors mainly 
designed for the in vivo production of a therapeutic peptide fragment by the introduction of the 
genetic information in the organism of the patient to be treated. This genetic information may be 
introduced in vitro in a cell that has been previously extracted from the organism, the modified cell 
15 being subsequently reintroduced in the said organism, directly in vivo into the appropriate tissue, and 
preferably in the olfactory epithelium. 

One specific embodiment for a method for delivering the corresponding protein or peptide to 
the interior of a cell of a vertebrate in vivo comprises the step of introducing a preparation 
comprising a physiologically acceptable carrier and a naked polynucleotide operatively coding for 
20 the polypeptide into the interstitial space of a tissue comprising the cell, whereby the naked 
polynucleotide is taken up into the interior of the cell and has a physiological effect. 

In a specific embodiment, the invention provides a composition for the in vivo production of 
an olfactory receptor polypeptide described therein containing a naked polynucleotide operatively 
coding for an olfactory receptor selected from the group of OLF1 to OLF10 or a fragment or a 
25 variant thereof, in solution in a physiologically acceptable carrier and suitable for introduction into a 
tissue to cause cells of the tissue to express the said protein or polypeptide. 

Advantageously, the composition described above is administered locally, near the site in 
which the expression of the olfactory receptor polypeptide under consideration or a fragment or a 
variant thereof is sought. 

30 The polynucleotide operatively coding for an olfactory receptor polypeptide or a fragment or 

variant thereof may be a vector comprising the genomic DNA or the complementary DNA (cDNA) 
coding for the corresponding protein and a promoter sequence allowing the expression of the 
genomic DNA or the complementary DNA in the desired eukaryotic cells, such as vertebrate cells, 
specifically mammalian cells. 

35 This vector may also contain one origin of replication that allows it to replicate in the 

eukaryotic host cell such as an origin of replication from a bovine papillomavirus. Alternatively, the 
vector can contain several, for example two, origins of replication of different origins in order to 
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allow said vector to replicate in different host cells, typically both in a prokaryotic cell such as E. 
coli and in an eukaryotic cell such as a mammalian epithelial cell, preferably a mammalian olfactory 
epithelial cell. 

Compositions comprising a polynucleotide are described in the PCT application N° WO 
5 90/1 1 092 (Vical Inc.) and also in the PCT application N° WO 95/1 1 307 (Institut Pasteur, INSERM, 
Universite d'Ottawa) as well as in the articles of Tacson et al. (1996) and of Huygen et al. (1996). 

In another embodiment, the DNA to be introduced is complexed with DEAE-dextran 
(Pagano et al., 1967) or with nuclear proteins (Kaneda et al., 1989), with lipids (Feigner et al., 1987) 
or encapsulated within liposomes (Fraley et al., 1980). 

10 In another embodiment, the polynucleotide encoding an olfactory receptor polypeptide of 

the invention or a fragment or a variant thereof may be included in a transfection system comprising 
polypeptides that promote its penetration within the host cells as it is described in the PCT 
application WO 95/10534 (Seikagaku Corporation). They can also be encapsulated in polymer 
microparticles as it is described in the PCT Application No WO 94/27238. 

15 The vector according to the present invention may advantageously be administered in the 

form of a gel that facilitates their transfection into the cells. Such a gel composition may be a 
complex of poly-L-lysine and lactose, as described by Midoux (1993) or also poloxamer 407 as 
described by Pastore (1994). Said vectof may also be suspended in a buffer solution or be associated 
with liposomes. 

20 The amount of the vector to be injected to the desired host organism vary according to the 

site of injection. As an indicative dose, it will be injected between 0,1 and 100 ug of the vector in an 
animal body, preferably a mammal body, for example a mouse body. 

In another embodiment of the vector according to the invention, said vector may be 
introduced in vitro in a host cell, preferably in a host cell previously harvested from the animal to be 

25 treated and more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that 
has been transformed with the vector coding for the desired olfactory receptor polypeptide or the 
desired fragment or variant thereof is implanted back into the animal body in order to deliver the 
recombinant protein within the body either locally or systemically. 

Suitable vectors for the in vivo expression of an olfactory receptor polypeptide of the 

30 invention or a fragment or a variant thereof are described hereunder. 

In one specific embodiment, the vector is derived from an adenovirus. Preferred 
adenoviruses vectors according to the invention are those described by Feldman and Steg (1996) or 
Ohno et al. (1994). Another preferred recombinant adenovirus according to this specific embodiment 
of the present invention is the adenovirus described by Ohwada et al. (1996) or the human 

35 adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal origin ( French patent application 
N°FR-93.05954). 
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Among the adenoviruses of animal origin it can be cited the adenoviruses of canine (CAV2, 
strain Manhattan or A26/61 [ATCC VR-800]), bovine, murine (Mavl , Beard et al., 1 980) or simian 
(SAV). 

Preferably, the inventors are using recombinant defective adenoviruses that may be prepared 
5 following a technique well-known by one skill in the art, for example as described by Levrero et al 
(1991) or by Graham (1984) or in the European patent application N° EP- 185.573. Another 
defective recombinant adenovirus that may be used according to the present invention, as well as a 
composition of matter containing such a defective recombinant adenovirus, is described in the PCT 
application N° WO 95/14785. 

10 Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 

recombinant gene delivery system of choice for the transfer of exogenous polynucleotides in vivo , 
particularly to mammals, including humans. These vectors provide efficient delivery of genes into 
cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host. 
The use of recombinant retrovirus vectors containing a nucleic acid according to the 

15 invention is also encompassed within the scope of the invention. A major prerequisite for the use of 
retroviruses is to ensure the safety of their use, particularly with regard to the possibility of the 
spread of wild-type virus in the cell population. The development of specialized cell lines (termed 
"packaging cells") which produce only replication defective retroviruses has increased the utility of 
retroviruses for in vivo gene delivery, and defective retroviruses are well characterized for use in 

20 gene transfer. Thus, recombinant retroviruses can be constructed in which a part of the retroviral 
coding sequence (gag, pol, env) has been replaced by nucleic acid encoding an olfactory receptor 
rendering the retrovirus defective. Protocols for producing recombinant retroviruses and for 
infecting cells in vitro and in vivo with such viruses can be found in "Current Protocols in Molecular 
Biology" (1989). 

25 Furthermore, it has been shown that it is possible to limit the infection spectrum of 

retroviruses and consequently of retroviral-based vectors, by modifying the viral packaging proteins 
on the surface of the viral particle, as described for example in the PCT Application No WO 
93/25234 or in the PCT Application No WO 94/ 06920. For instance, strategies for the modification 
of the infection spectrum of retroviral vectors include : coupling antibodies specific for cell surface 

30 antigens to the viral env protein (Julan et al., 1992) or coupling cell surface receptor ligands to the 
viral env protein (Neda et al., 1991). Coupling can be in the form of the chemical cross-linking with 
a protein or other variety (e.g. lactose to convert the env protein to an asialoglycoprotein), as well by 
generating fusion proteins (e.g. single-chain antibody/ewv fusion proteins). This technique, while 
useful to limit or otherwise direct the infection to certain tissue types, can also be used to convert an 

35 ecotropic vector into an amphotropic vector. 

Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or 
in vitro gene delivery vehicles of the present invention include retroviruses selected from the group 
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consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus 
and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include 4070 A and 1504 A 
(Hartley et al., 1976), Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No 
VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR-190; 
5 PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include Bryan 
high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Another preferred 
retroviral vector is that described by Roth et al. (Roth J. A. et al., 1996). 

Yet another viral vector system that is contemplated by the invention consists in the adeno- 
associated virus (AAV). Adeno-associated virus is a naturally occurring defective virus that requires 
10 another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient replication and a 
productive life cycle (Muzyczka et al., 1992). It is also one of the few viruses that may integrate its 
DNA into non-dividing cells, and exhibits a high frequency of stable integration (Flotte et al., 1992; 
Samulski et al., 1989; McLaughlin et al., 1989). One advantageous feature of AAV derives from its 
reduced efficacy for transducing primary cells relative to transformed cells. 

15 Cell hosts 

Another object of the invention consists in host cell that have been transformed or 
transfected with one of the polynucleotides described therein, and more precisely a polynucleotide 
comprising the coding sequence of any of the olfactory receptor polypeptide having the amino acid 
sequence of SEQ ID Nos 12-21 or fragments or variants thereof. Are included host cells that are 

20 transformed (prokaryotic cells) or that are transfected (eukaryotic cells) with a recombinant vector 
such as one of those described above. 

A recombinant host cell of the invention comprises any one of the polynucleotides or the 
recombinant vectors described therein. More particularly, the cell hosts of the present invention can 
comprise any of the polynucleotides described in the "Coding regions of the olfactory receptor gene" 

25 section, "Genomic sequence of olfactory receptor gene " section, the "Oligonucleotide Probes And 
Primers" section, the "Polynucleotide constructs" section.and the " Expression of an OLF1 to 
OLF10 coding polypeptide" section. 

Suitable prokaryotic hosts for transformation include E. coli, Bacillus subtilis, as well as 
various species within the genera of Streptomyces or Mycobacterium. Suitable eukaryotic hosts 

30 comprise yeast, insect cells, such as Drosophila and Sf9. Various mammalian cell hosts can also be 
employed to express recombinant protein. Examples of mammalian cell hosts include the COS-7 
lines of monkey kidney fibroblasts (Guzman, 1981), and other cell lines capable of expressing a 
compatible vector, for example the C127, 3T3, CHO, HeLa and BHK cell lines. The selection of an 
host is within the scope of the one skilled in the art. 

35 Preferred cell hosts used as recipients for the expression vectors of the invention are the 

followings : 

a) Prokaryotic host cells : Escherichia coli strains (LE. DH5-a strain) or Bacillus subtilis. 
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b) Eukaryotic host ceils : HeLa cells (ATCC N°CCL2: N°CCL2.1 ; N°CCL2.2). Cv 1 ceils 
(ATCC N°CCL70), COS cells (ATCC N°CRL1650; N°CRL1651), Sf-9 cells (ATCCN°CRL171 1). 

The constructs in the host cells can be used in a conventional manner to produce the gene 
product encoded by the recombinant sequence. 
5 Following transformation of a suitable host and growth of the host to an appropriate cell 

density, the selected promoter is induced by appropriate means, such as temperature shift or 
chemical induction, and cells are cultivated for an additional period. 

Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and 
the resulting crude extract retained for further purification. 
10 Microbial cells employed in expression of proteins can be disrupted by any convenient 

method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
agents. Such methods are well known by the skill artisan. 

Transgenic animals 

The terms "transgenic animals" or "host animals" are used herein designate animals that 
15 have their genome genetically and artificially manipulated so as to include one of the nucleic acids 
according to the invention. Preferred animals are non-human mammals and include those belonging 
to a genus selected from Mus (e.g. mice), Rattns (e.g. rats) and Oryctogalus (e.g. rabbits) which have 
their genome artificially and genetically altered by the insertion of a nucleic acid according to the 
invention. 

20 The transgenic animals of the invention all include within a plurality of their cells a cloned 

recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic 
acids comprising an olfactory receptor coding sequence selected from the group OLF1 to OLF10 an 
olfactory receptor regulatory polynucleotide or a DNA sequence encoding an antisense 
polynucleotide such as described in the present specification. 

25 More particularly, transgenic animals according to the invention contain in their somatic 

cells and/or in their germ line cells any of the polynucleotides described in the "Coding regions of 
the olfactory receptor gene" section, "Genomic sequence of olfactory receptor gene " section, the 
"Oligonucleotide Probes And Primers" section, the "Polynucleotide constructs" section and the " 
Expression of an OLF1 to OLF10 coding polypeptide" section. 

30 The replacement of the native genomic olfactory receptor sequence by a defective copy of 

said sequence may be preformed by techniques of gene targeting. Such techniques are notably 
described by Burright et al. (1997), Bates et al. (1997), Mangiarini et al. (1997), Davies et al. (1997). 

Second preferred transgenic animals of the invention have the murine olfactory receptor 
gene replaced either by a defective copy of the murine olfactory receptor gene or by an interrupted 

35 copy of the human olfactory receptor gene. A "defective copy" of a murine or a human olfactory 
receptor gene, is intended to designate a modified copy of these genes that is not or poorly 
transcribed in the resulting recombinant host animal or a modified copy of these genes leading to the 
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absence of synthesis of the corresponding translation product or alternatively leading to a modified 
and/or truncated translation product lacking the biological activity of the wild type olfactory receptor 
protein. The altered translation product thus contains amino acid modifications, deletions and 
substitutions. Modifications and deletions may render the naturally occurring gene nonfunctional, 
5 thus leading to a "knockout animal". These transgenic animals are critical for the creation of animal 
models of human diseases, and for eventual treatment of disorders related to alteration of the 
olfactory perception of odorant substances or molecules. Examples of such knockout mice are 
described in the PCT Applications Nos WO 97/34641, WO 96/12792 and WO 98/02354. 

The endogenous murine olfactory receptor gene can be interrupted by the insertion, between 
10 two contiguous nucleotide of said gene, of a part of all of a marker gene placed under the control of 
the appropriate promoter, for example the endogenous promoter of the endogenous murine olfactory 
receptor gene. The marker gene may be the neomycin resistance gene (neo) that may be operably 
linked to the phosphoglycerate kinase-1 (PGK-1) promoter, as described in the PCT Application No 
WO 98/02534. 

15 Thus, the invention is also directed to a transgenic animal contain in their somatic cells 

and/or in their germ line cells a polynucleotide selected from the following group of 
polynucleotides: 

a) a defective copy of the human olfactory receptor gene; 

b) a defective copy of the endogenous olfactory receptor gene, wherein the expression 

20 "endogenous olfactory receptor gene" designates an olfactory receptor gene that is naturally present 
within the genome of the animal host to be genetically modified. 

The invention also concerns a method for obtaining transgenic animals, wherein said 
methods comprise the steps of : 

a) replacing the endogenous copy of the animal olfactory receptor gene by a nucleic acid 
25 selected from the group consisting of a defective copy of the human olfactory receptor gene and a 

defective copy of the endogenous olfactory receptor gene in animal cells, preferably embryonic stem 
cells (ES); 

b) introducing the recombinant animal cells obtained at step a) in embryos, notably 
blastocysts of the animal; 

30 c) selecting the resulting transgenic animals, for example by detecting the defective copy of 

an olfactory receptor gene with one or several primers or probes according to the invention. 

Optionally, the transgenic animals may be bred together in order to obtain homozygous 
transgenic animals for the defective copy of the olfactory receptor gene introduced. 

The transgenic animals of the invention thus contain specific sequences of exogenous 
35 genetic material such as the nucleotide sequences described above in detail. 

In a preferred embodiment, these transgenic animals may be good experimental models in 
order to study the diverse pathologies related to disorders associated to alteration of the olfactory 
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perception of odorant substances or molecules, in particular concerning the transgenic animals 
within the genome of which has been inserted one or several copies of a polynucleotide encoding a 
native olfactory receptor protein, or alternatively a mutant olfactory receptor protein. 

Third preferred transgenic animals according to the invention contains in their somatic cells 
5 and/or in their germ line cells a polynucleotide selected from the following group of polynucleotides 

a) purified or isolated nucleic acid encoding an olfactory receptor polypeptide selected 
from OLF1 to OLF10, or a polypeptide fragment or variant thereof. 

b) a purified or isolated nucleic comprising at least 8 consecutive nucleotides of the 

10 nucleotide sequence SEQ ID No 1, a nucleotide sequence complementary thereto or a fragment 
or a variant thereof; 

c) a purified or isolated nucleic acid comprising a nucleotide sequence selected from the 
group of SEQ ID 2-1 1 , a sequence complementary thereto or a fragment or a variant thereof. 

The transgenic animals of the invention thus contain specific sequences of exogenous 
15 genetic material such as the nucleotide sequences described above in detail. 

In a first preferred embodiment, these transgenic animals may be good experimental models 
in order to screen the candidate substance of interest interacting with the olfactory receptor under 
consideration. 

Since it is possible to produce transgenic animals of the invention using a variety of different 

20 sequences, a general description will be given of the production of transgenic animals by referring 
generally to exogenous genetic material. This general description can be adapted by those skilled in 
the art in order to incorporate the DNA sequences into animals. For more details regarding the 
production of transgenic animals, and specifically transgenic mice, it may be referred to Sandou et 
al. (1994) and also to US Patents Nos 4,873,191, issued Oct.10, 1989, 5,968,766, issued Dec. 16, 

25 1997 and 5,387,742, issued Feb. 28, 1995. 

Transgenic animals of the present invention are produced by the application of procedures 
which result in an animal with a genome that incorporates exogenous genetic material which is 
integrated into the genome. The procedure involves obtaining the genetic material, or a portion 
thereof, which encodes either a coding sequence, a non-coding polynucleotide or a DNA sequence 

30 encoding an antisense polynucleotide of an olfactory receptor selected from the group OLF1 to 
OLF10 such as described in the present specification. 

A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem cell 
line. The insertion is made using electroporation. The cells subjected to electroporation are screened 
(e.g. Southern blot analysis) to find positive cells which have integrated the exogenous recombinant 

35 polynucleotide into their genome. An illustrative positive-negative selection procedure that may be 
used according to the invention is described by Mansour et al. (1988). Then, the positive cells are 
isolated, cloned and injected into 3.5 days old blastocysts from mice. The blastocysts are then 
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inserted into a female host animal and allowed to grow to term. The offsprings of the female host 
are tested to determine which animals are transgenic e.g. include the inserted exogenous DNA 
sequence and which are wild-type. 

Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a 
5 recombinant expression vector or a recombinant host cell according to the invention. 

Recombinant Cell Lines Derived From The Transgenic Animals Of The Invention. 

A further object of the invention comprises recombinant host cells obtained from a 
transgenic animal described herein. In one embodiment the invention encompasses cells derived 
from non-human host mammals and animals comprising a recombinant vector of the invention or an 
1 0 olfactory receptor gene disrupted by homologous recombination with a knock out vector. 

Recombinant cell lines may be established in vitro from cells obtained from any tissue of a 
transgenic animal according to the invention, for example by transfection of primary cell cultures 
with vectors expressing one-genes such as SV40 large T antigen, as described by Chou (1989) and 
Shay et al.(1991). 

15 F. METHODS FOR SCREENING SUBSTANCES OR MOLECULES 
INTERACTING WITH AN OLFACTORY RECEPTOR PROTEIN 

The present invention pertains to methods for screening substances of interest, in particular 
odorant substances or molecules that interact with an olfactory receptor protein selected from the 
group consisting of OLF1 to OLF10, or one peptide fragment or variant thereof. In one embodiment, 

20 the candidate substance is devoid of odorant propriety but it is able to bind the olfactory receptor and 
to trigger the transduction of signals. 

For the purpose of the present invention, a ligand means a molecule, such as a protein, a 
peptide, an antibody or any synthetic chemical compound capable of binding to the olfactory 
receptor protein or one of its fragments or variants or to modulate the expression of the 

25 polynucleotide coding for olfactory receptor or a fragment or variant thereof. 

In the ligand screening method according to the present invention, a biological sample or a 
defined molecule to be tested as a putative ligand of the olfactory receptor protein is brought into 
contact with the corresponding purified olfactory receptor protein, for example the corresponding 
purified recombinant olfactory receptor protein produced by a recombinant cell host as described 

30 herein, in order to form a complex between this protein and the putative ligand molecule to be tested. 
As an illustrative example, to study the interaction of the olfactory receptor protein, or a 
fragment comprising comprising any of the fragments described in the section "OLF1 to OLF10 
proteins and polypeptide fragments" with drugs or small molecules, such as molecules generated 
through combinatorial chemistry approaches, the microdialysis coupled to HPLC method described 

35 by Wang et al. (1997) or the affinity capillary electrophoresis method described by Bush et al. 
(1997) can be used. 
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In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which 
interact with the olfactory receptor protein, or a fragment comprising any of the fragments described 
in the section "OLFI to OLF 10 proteins and polypeptide fragments" may be identified using assays 
such as the following. The molecule to be tested for binding is labeled with a detectable label, such 
5 as a fluorescent , radioactive, or enzymatic tag and placed in contact with immobilized olfactory 
receptor protein, or a fragment thereof under conditions which permit specific binding to occur, such 
as affinity columns. In some embodiments, chimeric proteins containing the olfactory receptor 
protein fused to proteins facilitating purification, such as glutathion S transferase (GST) are used. 
After removal of non-specifically bound molecules, bound molecules are detected using appropriate 
10 means. 

In one embodiment, proteins, peptides, carbohydrates, lipids, or small molecules generated 
by combinatorial chemistry interacting with the olfactory receptor protein, or a fragment or a variant 
thereof can also be screened by using an Optical Biosensor as described in Edwards and 
Leatherbarrow (1997) and also in Szabo et al. (1995). The main advantage of the method is that it 
15 allows the determination of the association rate between the olfactory receptor protein and molecules 
interacting with the olfactory receptor protein. It is thus possible to select specifically ligand 
molecules interacting with the olfactory receptor protein, or a fragment thereof, through strong or 
conversely weak association constants. 

Another object of the present invention comprises methods and kits for the screening of 
20 candidate substances that interact with olfactory receptor polypeptide. 

The present invention pertains to methods for screening substances of interest that interact 
with an olfactory receptor protein or one fragment or variant thereof. By their capacity to bind 
covalently or non-covalently to an olfactory receptor protein or to a fragment or variant thereof, 
these substances or molecules may be advantageously used both in vitro and in vivo. In vitro, said 
25 interacting molecules may be used as detection means in order to identify the presence of an 
olfactory receptor protein in a sample, preferably a biological sample. 

A first method for the screening of a candidate substance interacting with an olfactory 
receptor polypeptide selected from the group consisting of SEQ ID Nos 12-21 , or fragments or 
variants thereof, comprises the following steps : 
30 a) providing a polypeptide selected from the group consisting of the polypeptides 

comprising, consisting essentially of, or consisting of the amino acid sequences of SEQ ID 
Nos 12-21 , or a peptide fragment or a variant thereof; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; and 

35 d) detecting the complexes formed between said polypeptide and said candidate 

substance. 
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Various candidate substances or molecules can be assayed for interaction with an olfactory 
receptor polypeptide. These substances or molecules include, without being limited to, natural or 
synthetic organic compounds or molecules of biological origin such as polypeptides. When the 
candidate substance or molecule comprises a polypeptide, this polypeptide may be the resulting 
5 expression product of either a phage clone belonging to a phage-based random peptide library, or of 
a cDNA library cloned in a vector suitable for performing a two-hybrid screening assay. 

In one embodiment of the screening method defined above, the complexes formed between 
the polypeptide and the candidate substance are further incubated in the presence of a polyclonal or a 
monoclonal antibody that specifically binds to the olfactory receptor protein of the invention under 

1 0 consideration or to said peptide fragment or variant thereof. 

In another embodiment of the present screening method, increasing concentrations of a 
substance competing for binding to the olfactory receptor with the considered candidate substance is 
added, simultaneously or prior to the addition of the candidate substance or molecule, when 
performing step c) of said method. By this technique, the detection and optionally the quantification 

15 of the complexes formed between the olfactory receptor protein or the peptide fragment or variant 
thereof and the candidate substance or molecule to be screened allows the one skilled in the art to 
determine the affinity value of said substance or molecule for said olfactory receptor protein or the 
peptide fragment or variant thereof. 

The olfactory receptor selected from the group consisting of OLF1 to OLF10, or a peptide 

20 fragment or a variant thereof, can be overexpressed and purified in a bacterial system such as E coli 
as described in Kiefer et al. (1996) and Tucker et al. (1996). The olfactory receptor coding sequence 
can be fused to its N-terminus with GST (Glutathione S transferase) or MBP (Maltose Binding 
Protein) and to its C-terminus with poly-histidine tag, Bio tag or Strep tag for facilitating the 
purification of the expressed protein. The Bio tag is 13 amino acid residues long, is biotinylated in 

25 vivo in E. coli, and will therefore bind to both avidin and streptavidin. The Strep tag is 9 amino acid 
residues long and binds specifically to streptavidin, but not to avidin. Therewith, a purification step 
by affinity can be carried out based on the interaction of a poly-histidine tail with immobilized metal 
ions, of the biotinylated Bio tag with monomelic avidin, of the Strep tag with streptavidin, of the 
GST segment with the glutathione, or of the MBP segment with the maltose. Thioredoxin can be 

30 eventually inserted between the receptor C-terminus and the tag and could increase the expression 
level. The fusion protein is solubilized in 1% N-laurroyl sarcosine, and 0.2 % digitonin is added. It is 
purified by affinity chromatography. The MBP, GST or tag segment can be then removed. After the 
olfactory receptor protein purification, sarcosyl can be replaced with digitonin which is a detergent 
widely used to stabilize the G protein-coupled receptors. The purified receptor is reconstituted into 

35 lipid vesicles preferably composed of phosphatidylcholine: phosphatidylglycerol (4:1) by adding the 
lipid dissolved in dodecyl maltoside and removing the detergent. 
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The olfactory receptor selected from the group consisting of OLF1 to OLF10, or a peptide 
fragment or a variant thereof, can also be overexpressed and purified in a baculovirus/Sf9 system as 
described in Nekrasova et al. (1996). The olfactory receptor gene, or a fragment thereof, is 
preferably expressed with a "flag" peptide epitope tag and/or a poiy-histidine tag to either its N- or 
5 C-terminus for facilitating the purification of the expressed protein. Therefore, the olfactory receptor 
gene, or a fragment or a variant thereof, is preferably subcloned into the baculovirus transfer vector 
pAcSGHisNT to create constructs that encoded olfactory receptor with amino-terminal poly- 
histidine tag. The resulting transfer vector is transfected preferably with BaculoGold DNA into Sf9 
cells. The expressed olfactory receptors are then solubilized either in 1 % N-lauryl sarcosine or 1.5 
10 % ^phosphatidylcholine, but preferably in ^phosphatidylcholine. After solubilization, the 
olfactory receptors are purified by affinity chromatography on nickel nitrilotriacetic acid resin and 
" by cation-exchange chromatography with carboxymethyl sepharaose cation-exchange column. The 
tag segment can be then removed. The purified receptor is reconstituted into lipid vesicles preferably 
composed of dimyritoylglycerophosphocholine, cholesterol, dialmitoylgycerophosphoserine and 
1 5 dipalmitoylglycerophosphoethanolamine (in molecular ratio 54:35: 1 0: 1 ) 

Once the olfactory receptor protein or one of its peptide fragments or variants has been 
obtained as described above, candidate substances or molecules can then be assayed for their 
capacity to bind thereto. 

The candidate substance or molecule to be assayed for interacting with an olfactory receptor 
20 of the invention may be of diverse nature, including, without being limited to, natural or synthetic 
organic compounds or molecules of biological origin such as peptide. It can comprise aromatic or 
aliphatic compounds with various functional groups such as alcohol, aldehyde, ester, ether, ketone, 
carboxylic, amine. An example of a substance panel which can be used is provided by Zhao et al. 
(1998). 

25 The screening of substances or molecules interacting with an olfactory receptor, or a 

fragment thereof, is carried out by photoaffmity labeling experiments described in Kiefer et al. 
(1996). The odorant is labeled, preferably radiolabeled, and incubated with lipid vesicles including 
the purified olfactory receptor. The odorants bound to the olfactory receptors are crosslinked by 
exposure to ultraviolet light. Then, the samples are subjected to SDS polyacrylamide gel 

30 electrophoresis. Proteins are visualized by Coomassie-blue staining and the odorants are revealed, 
preferably by autoradiography. In another embodiment, the proteins can be visualized by Western 
Blot with a polyclonal or monoclonal antibody that specifically binds to the olfactory receptor under 
consideration. Once a substance binding to the considered olfactory receptor is identified, the 
binding specificity of this substance is confirmed with competition experiments demonstrating that 

35 increasing concentrations of unlabeled ligand accomplish a dose-dependent displacement of the 
radioactive ligand. 
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The identification of a first substance specific to one of the olfactory receptors of the present 
invention facilitates the screening of other substances. Indeed, the binding capacity of the screened 
substances to this olfactory receptor can be carried out through a competition experiments against 
the first identified substance which is labeled. 
5 The invention also pertains to kits useful for performing the hereinbefore described 

screening method. Preferably, such kits comprise a polypeptide selected form the group consisting of 
the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21 or a peptide fragment or a 
variant thereof, and optionally means useful to detect the complex formed between the considered 
olfactory receptor polypeptide or its peptide fragment or variant and the candidate substance. In a 
10 preferred embodiment, the kit can comprise an already identified substance specific of the olfactory 
receptor under consideration which is labeled, preferably radiolabeled, and a monoclonal or 
polyclonal antibody directed against the considered olfactory receptor. 

A second screening method embodiment consists of a method for the screening of ligand 
molecules interacting with an olfactory receptor polypeptide selected from the group consisting of 
1 5 SEQ ID Nos 12-21, wherein said method comprises : 

a) providing a recombinant eukaryotic host cell containing a nucleic acid encoding a 
> polypeptide selected from the group comprising, consisting essentially of, or consisting the 
polypeptides comprising the amino acid sequences SEQ ID Nos 12-21, or variants or 
fragments thereof; 

20 b) preparing membrane extracts of said recombinant eukaryotic host cell; 

c) bringing into contact the membrane extracts prepared at step b) with a selected 
ligand molecule; and 

d) detecting the production level of second messengers metabolites. 

The baculovirus-Sf9 cell system enables a foreign DNA encoding an olfactory receptor 
25 selected from the group consisting of OLF1 to OLF10, or a peptide fragment or a variant thereof, to 
be expressed with high efficiency. Moreover, it can be used to couple a heterologous expressed 
olfactory receptor to the second messenger cascades. Therefore, the binding specificity of an 
olfactory receptor can be assessed through an assay of odorant-induced generation of cAMP or 
inositol triphosphate (InsP3) described in Raming et al. (1993). 
30 Briefly, a cell line derived from Sf9 is infected by baculovirus, such as baculovirus transfer 

vector pVL1393, harboring DNA encoding the olfactory receptor or a fragment thereof downstream 
from a strong promoter, preferably the polyhedrin promoter. Recombinant virus are purified and 
used to infect 1.5 x 10 8 Sf9 cells in 100 ml spinner cultures at high multiplicity of infection. Cells are 
collected after a postinfection delay, preferably 48 h, and membrane fractions are isolated as follow. 
35 Cells are pelleted (at 250g for 10 min at 4°C), washed with Ringer solution (120 mM NaCl, 

5 mM KC1, 1 .6 mM K 2 HP0 4 , 1 2 mM MgSCX,, 25 mM NaHC0 3 , 5 mM glucose, pH7.4) and 
disrupted using a glass homogenizer in homogenization buffer (10 mM Tris-HCl, pH 8.0, 2 mM 
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EGTA, 3 mM MgC12) containing antiproteases. The homogenate is centrifuged and the pellet is 
washed. Supernatants are centrifuged at 33,000g for 20 min. The final pellet is resuspended in 
homogenization buffer and the protein concentration is determined. 

Assay of odorant substance-induced generation of second messengers cAMP and InsP3 is 
5 performed as follow. Suspensions of Sf9 cell membrane preparations (300 \ig protein) are rapidly 
mixed with a stimulation buffer (200 mM NaCl, 10 mM EGTA, 50 mM MOPS ? 2.5 mM MgCl 2 , 1 
mM DTT, 0.05 % Na-cholate, I mM ATP, 1 \iM GTP, and 0.02 nM free Ca 2 ") containing the 
candidate substances at the appropriate concentrations. The reaction is stopped after a short time, 
preferably 1 sec, by injecting 10 % Perchloric acid. Quenched samples are assayed for second 
10 messenger concentrations. The quenched and cooled samples are vortexed and centrifuged for 5 min 
at 2500g at 4°C. 400 nl of the supernatants are transferred to a separate tube containing 1 00 \il of 1 0 
mM EDTA (pH 7). The sample are neutralized by adding 500 \i\ of a 1 :1 (v/v) mixture of 1,1,2 
trichlorofluoroethane, followed by thorough mixing. After centrifugation for 2 min at 500g, three 
phases are obtained. The upper phase, which contains all water soluble components, is used for 
15 carrying out the concentration measurements. cAMP and InsP3 concentrations are determined 
according the procedure of Steiner et al. (1972) and Palmer et al. (1989), respectively. 

The invention also concerns a kit for the screening of odorant ligand molecules interacting 
with an olfactory receptor polypeptide selected from the group consisting of the polypeptides 
comprising the amino acid sequences SEQ IDNos 12-21, wherein said kit comprises : 
20 a) a recombinant eukaryotic host cell containing a nucleic acid encoding a 

polypeptide selected from the group comprising, consisting essentially of, or consisting of 
the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21 or variants or 
fragments thereof; and 

b) optionally, reagents necessary for the measurement of second messenger 
25 metabolites in a sample. 

The screening of substances or molecules interacting with an olfactory receptor, or a 
fragment thereof, can also be carried out through the measurement of the increase of the response to 
odorants in an olfactory epithelium overexpressing an olfactory receptor selected from the group 
consisting of OLF1 to OLF10, or a peptide fragment or a variant thereof, as described in Zhao et al. 
30 (1998). The response is assessed by electro-olfactogram which measures a transepithelial potential 
resulting from the summed activity of many olfactory neurons. In order to overexpress the olfactory 
receptor, or a fragment thereof, in an olfactory epithelium, an adenovirus containing the olfactory 
receptor gene is generated. To aid in electro-olfactogram electrode placements, the olfactory 
receptor coding sequence is preferably combined in the adenovirus with the physiological marker 
35 green fluorescent protein (GFP) in such manner that the two proteins are simultaneously expressed. 
The olfactory epithelium of an animal, preferably of a rat, is infected by the adenovirus. Animals are 
killed 3 to 8 days after infection and the nasal cavity is opened, exposing the medial surface of the 
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nasal turbinates. Under fluorescent illumination, the GFP clearly marked the pattern of viral 
infection and olfactory receptor expression. Odorant substance are applied to the olfactory 
epithelium in the vapor phase by injecting a pressurized pulse of odorant vapor into a continuous 
stream of humidified clean air. Electro-olfactogram recordings are obtained with a glass capillary 
5 electrode placed on the surface of the epithelium and connected to a differential amplifier. The 
olfactory receptor specificity is assessed from the increase of response in infected animals compared 
to uninfected animals. To account for the variability between animals, a standard odorant to which 
all other odorant responses are normalized is used. 

A third screening method embodiment consists of a method for the screening of Hgand 
10 molecules interacting with an olfactory receptor polypeptide selected from the group consisting of 
the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21, wherein said method 
comprises : 

a) providing an adenovirus containing a nucleic acid encoding a polypeptide selected 
from the group comprising, consisting essentially of, or consisting of the polypeptides 

15 comprising the amino acid sequences SEQ ID Nos 12-21, or variants or fragments thereof; 

b) infecting an olfactory epithelium with said adenovirus; 

c) bringing into contact the olfactory epithelium b) with a selected ligand molecule; 

and 

d) detecting the increase of the response to said ligand molecule. 

20 G. METHODS FOR INHIBITING THE EXPRESSION OF AN OLFACTORY 

RECEPTOR GENE 

Other therapeutic compositions according to the present invention comprise advantageously 
an oligonucleotide fragment of the nucleic sequence of olfactory receptor as an antisense tool or a 
triple helix tool that inhibits the expression of the corresponding olfactory receptor gene. A 
25 preferred fragment of the nucleic sequence of olfactory receptor comprises an allele of at least one of 
the biallelic markers Al to A13. 

Antisense Approach 

Preferred methods using antisense polynucleotide according to the present invention are the 
procedures described by Sczakiel et al.(1995). 
30 Preferred antisense polynucleotides are described in the section entitled "Nuclear Antisense 

DNA Constructs". 

The antisense nucleic acids should have a length and melting temperature sufficient to 
permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the 
olfactory receptor mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for 
35 use in gene therapy are disclosed in Green et ah, (1986) and Izant and Weintraub, (1984). 
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In some strategies, antisense m lecules are obtained by reversing the orientation of the 
olfactory receptor coding region with respect to a promoter so as to transcribe the opposite strand 
from that which is normally transcribed in the cell. The antisense molecules may be transcribed 
using in vitro transcription systems such as those which employ T7 or SP6 polymerase to generate 
5 the transcript. Another approach involves transcription of olfactory receptor antisense nucleic acids 
in vivo by operably linking DNA containing the antisense sequence to a promoter in a suitable 
expression vector. 

Alternatively, suitable antisense strategies are those described by Rossi et al.(1991), in the 
International Applications Nos. WO 94/23026, WO 95/04141, WO 92/18522 and in the European 

10 Patent Application No. EP 0 572 287 A2 

An alternative to the antisense technology that is used according to the present invention 
comprises using ribozymes that will bind to a target sequence via their complementary 
polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing its target site 
(namely "hammerhead ribozymes"). Briefly, the simplified cycle of a hammerhead ribozyme 

15 comprises (1) sequence specific binding to the target RNA via complementary antisense sequences; 
(2) site-specific hydrolysis of the cleavable motif of the target strand; and (3) release of cleavage 
products, which gives rise to another catalytic cycle. Indeed, the use of long-chain antisense 
polynucleotide (at least 30 bases long) or ribozymes with long antisense arms are advantageous. A 
preferred delivery system for antisense ribozyme is achieved by covalently linking these antisense 

20 ribozymes to lipophilic groups or to use liposomes as a convenient vector. Preferred antisense 
ribozymes according to the present invention are prepared as described by Sczakiel et al.(1995), the 
specific preparation procedures being referred to in said article. 

Triple Helix Approach 

The olfactory receptor genomic DNA may also be used to inhibit the expression of the 
25 olfactory receptor gene based on intracellular triple helix formation. 

Triple helix oligonucleotides are used to inhibit transcription from a genome. They are 
particularly useful for studying alterations in cell activity when it is associated with a particular 
gene. 

Similarly, a portion of the olfactory receptor genomic DNA can be used to study the effect 
30 of inhibiting olfactory receptor transcription within a cell. Traditionally, homopurine sequences 
were considered the most useful for triple helix strategies. However, homopyrimidine sequences can 
also inhibit gene expression. Such homopyrimidine oligonucleotides bind to the major groove at 
homopurineihomopyrimidine sequences. Thus, both types of sequences from the olfactory receptor 
gen mic DNA are contemplated within the scope of this invention. 
35 To carry out gene therapy strategies using the triple helix approach, the sequences of the 

olfactory receptor genomic DNA are first scanned to identify 10-mer to 20-mer homopyrimidine or 
homopurine stretches which could be used in triple-helix based strategies for inhibiting olfactory 
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receptor expression. Following identification of candidate homopyrimidine or homopurine 
stretches, their efficiency in inhibiting olfactory receptor expression is assessed by introducing 
varying amounts of oligonucleotides containing the candidate sequences into tissue culture cells 
which express the olfactory receptor gene. 
5 The oligonucleotides can be introduced into the cells using a variety of methods known to 

those skilled in the art, including but not limited to calcium phosphate precipitation, DEAE-Dextran, 
electroporation, liposome-mediated transfection or native uptake. 

Treated cells are monitored for altered cell function or reduced olfactory receptor expression 
using techniques such as Northern blotting, KNase protection assays, or PCR based strategies to 
10 monitor the transcription levels of the olfactory receptor gene in cells which have been treated with 
the oligonucleotide. 

The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells 
may then be introduced in vivo using the techniques described above in the antisense approach at a 
dosage calculated based on the in vitro results, as described in antisense approach. 
15 In some embodiments, the natural (beta) anomers of the oligonucleotide units can be 

replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an 
intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha 
oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides 
suitable for triple helix formation see Griffin et al.(1989). 

20 H. COMPUTER-RELATED EMBODIMENTS 

As used herein the term "nucleic acid codes of the invention" encompass the nucleotide 
sequences comprising, consisting essentially of, or consisting of any of the polynucleotides 
described in the "Coding Regions of the olfactory receptor gene" section, "Genomic sequence of the 
olfactory receptor gene" section and the "Oligonucleotide Probes And Primers" section, or variants 

25 thereof, or complementary sequences thereto. Homologous sequences refer to a sequence having at 
least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, or 75% homology to these contiguous spans. 
Homology may be determined using any method described herein, including BLAST2N with the 
default parameters or with any modified parameters. Homologous sequences also may include RNA 
sequences in which uridines replace the thymines in the nucleic acid codes of the invention. 

30 As used herein the term "polypeptide codes of the invention" encompass the polypeptide 

sequences comprising any of the polypeptides described in the " OLF1 to OFL10 proteins and 
polypeptide fragments". 

It will be appreciated that the nucleic acid and polypeptide codes of the invention can be 
represented in the traditional single character format or three letter format respectively (See the inside 

35 back cover of Stiyer, Lubert. Biochemistry, 3 M edition. W. H Freeman & Co., New York.) or in any 
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other format or code which records the identity of the nucleotides or the amino acid respectively in a 
sequence. 

It will be appreciated by those skilled in the art that the nucleic acid codes of the invention and 
polypeptide codes of the invention can be stored, recorded, and manipulated on any medium which can 
5 be read and accessed by a computer. As used herein, the words "recorded" and "stored" refer to a 
process for storing information on a computer medium. A skilled artisan can readily adopt any of the 
presently known methods for recording information on a computer readable medium to generate 
manufactures comprising one or more of the nucleic acid codes of the invention, or one or more of the 
polypeptide codes of the invention. Another aspect of the present invention is a computer readable 

10 medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 nucleic acid codes of the 
invention. Another aspect of the present invention is a computer readable medium having recorded 
thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 polypeptide codes of the invention. 

Computer readable media include magnetically readable media, optically readable media, 
electronically readable media and magnetic/optical media. For example, the computer readable media 

1 5 may be a hard disc, a floppy disc, a magnetic tape, CD-ROM, DVD, RAM, or ROM as well as other 
types of other media known to those skilled in the art 

Embodiments of the present invention include systems, particularly computer systems which 
contain the sequence information described herein. As used herein, "a computer system" refers to the 
hardware components, software components, and data storage components used to store and/or analyze 

20 the nucleotide sequences of the nucleic acid codes of the invention, the amino acid sequences of the 
polypeptide codes of the invention, or other sequences. The computer system preferably includes the 
computer readable media described above, and a processor for accessing and manipulating the sequence 
data. 

In some embodiments, the computer system may further comprise a sequence comparer for 
25 comparing the nucleic acid codes or polypeptide codes of the invention stored on a computer readable 
medium to reference nucleotide sequences stored on a computer readable medium. A "sequence 
comparer" refers to one or more programs which are implemented on the computer system to compare a 
nucleotide or polypeptide sequence with other nucleotide or polypeptide sequences and/or compounds 
including but not limited to peptides, peptidomimetics, and chemicals the sequences or structures of 
30 which are stored within the data storage means. For example, the sequence comparer may compare the 
nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the 
polypeptide codes of the invention stored on a computer readable medium to reference sequences stored 
on a computer readable medium to identify homologies, motifs implicated in biological function, or 
structural motifs. The various sequence comparer programs identified elsewhere in this patent 
35 specification are particularly contemplated for use in this aspect of the invention. 

Accordingly, one aspect of the present invention is a computer system comprising a 
processor, a data storage device having stored thereon a nucleic acid code of the invention or a 
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polypeptide code of the invention, a data storage device having retrievably stored thereon reference 
nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of the 
invention or polypeptide code of the invention and a sequence comparer for conducting the 
comparison. The sequence comparer may indicate a homology level between the sequences 
5 compared or identify structural motifs in the nucleic acid code of the invention and polypeptide 
codes of the invention or it may identify structural motifs in sequences which are compared to these 
nucleic acid codes and polypeptide codes. In some embodiments, the data storage device may have 
stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 
invention or polypeptide codes of the invention. 

10 Another aspect of the present invention is a method for determining the level of homology 

between a nucleic acid code of the invention and a reference nucleotide sequence, comprising the 
steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a 
computer program which determines homology levels and determining homology between the nucleic 
acid code and the reference nucleotide sequence with the computer program. The computer program 

1 5 may be any of a number of computer programs for determining homology levels, including those 
specifically enumerated herein, including BLAST2N with the default parameters or with any modified 
' parameters. The method may be implemented using the computer systems described above. The 
method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described nucleic 
acid codes of the invention through the use of the computer program and determining homology 

20 between the nucleic acid codes and reference nucleotide sequences. 

Alternatively, the computer program may be a computer program which compares the 
nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide 
sequences in order to determine whether the nucleic acid code of the invention differs from a reference 
nucleic acid sequence at one or more positions. Optionally such a program records the length and 

25 identity of inserted, deleted or substituted nucleotides with respect to the sequence of either the 

reference polynucleotide or the nucleic acid code of the invention. In one embodiment, the computer 
program may be a program which determines whether the nucleotide sequences of the nucleic acid 
codes of the invention contain one or more single nucleotide polymorphisms (SNP) with respect to a 
reference nucleotide sequence. These single nucleotide polymorphisms may each comprise a single 

30 base substitution, insertion, or deletion. 

Another aspect of the present invention is a method for determining the level of homology 
between a polypeptide code of the invention and a reference polypeptide sequence, comprising the 
steps of reading the polypeptide code of the invention and the reference polypeptide sequence through 
use of a computer program which determines homology levels and determining homology between the 

35 polypeptide code and the reference polypeptide sequence using the computer program. 

Accordingly, another aspect of the present invention is a method for determining whether a 
nucleic acid code of the invention differs at one or more nucleotides from a reference nucleotide 
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sequence comprising the steps of reading the nucleic acid code and the reference nucleotide 
sequence through use of a computer program which identifies differences between nucleic acid 
sequences and identifying differences between the nucleic acid code and the reference nucleotide 
sequence with the computer program. In some embodiments, the computer program is a program 
5 which identifies single nucleotide polymorphisms. The method may be implemented by the 
computer systems described above. The method may also be performed by reading at least 2, 5, 10, 
15, 20, 25, 30, or 50 of the nucleic acid codes of the invention and the reference nucleotide 
sequences through the use of the computer program and identifying differences between the nucleic 
acid codes and the reference nucleotide sequences with the computer program. 

10 An "identifier" refers to one or more programs which identifies certain features within the 

above-described nucleotide sequences of the nucleic acid codes of the invention or the amino acid 
sequences of the polypeptide codes of the invention. 

In one embodiment, the identifier may comprise a molecular modeling program which 
determines the 3-dimensional structure of the polypeptides codes of the invention. In some 

15 embodiments, the molecular modeling program identifies target sequences that are most compatible 
with profiles representing the structural environments of the residues in known three-dimensional 
protein structures. (See, e.g., Eisenberg et al., U.S. Patent No. 5,436,850 issued July 25, 1995). In 
another technique, the known three-dimensional structures of proteins in a given family are 
superimposed to define the structurally conserved regions in that family. This protein modeling 

20 technique also uses the known three-dimensional structure of a homologous protein to approximate 
the structure of the polypeptide codes of the invention. (See e.g., Srinivasan, et al., U.S. Patent 
No. 5,557,535 issued September 17, 1996). Conventional homology modeling techniques have been 
used routinely to build models of proteases and antibodies. (Sowdhamini et al., (1997)). 
Comparative approaches can also be used to develop three-dimensional protein models when the 

25 protein of interest has poor sequence identity to template proteins. In some cases, proteins fold into 
similar three-dimensional structures despite having very weak sequence identities. For example, the 
three-dimensional structures of a number of helical cytokines fold in similar three-dimensional 
topology in spite of weak sequence homology. 

The recent development of threading methods now enables the identification of likely 

30 folding patterns in a number of situations where the structural relatedness between target and 
template(s) is not detectable at the sequence level. Hybrid methods, in which fold recognition is 
performed using Multiple Sequence Threading (MST), structural equivalencies are deduced from the 
threading output using a distance geometry program DRAGON to construct a low resolution model, 
and a full-atom representation is constructed using a molecular modeling package such as 

35 QUANTA. According to this 3-step approach, candidate templates are first identified by using the 
novel fold recognition algorithm MST, which is capable of performing simultaneous threading of 
multiple aligned sequences onto one or more 3-D structures. In a second step, the structural 
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equivalencies obtained from the MST output are converted into interresidue distance restraints and 
fed into the distance geometry program DRAGON, together with auxiliary information obtained 
from secondary structure predictions. The program combines the restraints in an unbiased manner 
and rapidly generates a large number of low resolution model confirmations. In a third step, these 
5 low resolution model confirmations are converted into full-atom models and subjected to energy 
minimization using the molecular modeling package QUANTA. (See e.g., Aszodi et al., ( 1 997)). 

he results of the molecular modeling analysis may then be used in rational drug design 
techniques to identify agents which modulate the activity of the polypeptide codes of the invention. 
Accordingly, another aspect of the present invention is a method of identifying a feature 

10 within the nucleic acid codes of the invention or the polypeptide codes of the invention comprising 
reading the nucleic acid code(s) or the polypeptide code(s) through the use of a computer program 
which identifies features therein and identifying features within the nucleic acid code(s) or 
polypeptide code(s) with the computer program. In one embodiment, computer program comprises a 
computer program which identifies open reading frames. In a further embodiment, the computer 

15 program identifies structural motifs in a polypeptide sequence. In another embodiment, the 
computer program comprises a molecular modeling program. The method may be performed by 
reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 
invention or the polypeptide codes of the invention through the use of the computer program and 
identifying features within the nucleic acid codes or polypeptide codes with the computer program. 

20 The nucleic acid codes of the invention or the polypeptide codes of the invention may be 

stored and manipulated in a variety of data processor programs in a variety of formats. For example, 
they may be stored as text in a word processing file, such as Microsoft WORD or WORDPERFECT or 
as an ASCII file in a variety of database programs familiar to those of skill in the ait, such as DB2, 
SYBASE, or ORACLE. In addition, many computer programs and databases may be used as sequence 

25 comparers, identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to 
the nucleic acid codes of the invention or the polypeptide codes of the invention. The following list is 
intended not to limit the invention but to provide guidance to programs and databases which are useful 
with the nucleic acid codes of the invention or the polypeptide codes of the invention. The programs 
and databases which may be used include, but are not limited to: MacPattem (EMBL), DiscoveryBase 

30 (Molecular Applications Group), GeneMine (Molecular Applications Group), Look (Molecular 
Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), 
BLASTN and BLASTX (AJtschul et al, 1990), FASTA (Pearson and Lipman, 1988), FASTDB 
(Brutlag et al., 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations 
Inc.), Cerius 2 DBAccess (Molecular Simulations Inc.), HypoGen (Molecular Simulations Inc.), Insight 

35 II, (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular 
Simulations Inc.), Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc.), 
QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler 
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(Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular 
Simulations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular 
Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), 
the EMBL/Swissprotein database, the MDL Available Chemicals Directory database, the MDL Drug 
5 Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug 
Index database, the BioByteMasterFile database, the Genbank database; and the Genseqn database. 
Many other programs and data bases would be apparent to one of skill in the art given the present 
disclosure. 

Motifs which may be detected using the above programs include sequences encoding 
10 leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and 
beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded 
proteins, sequences implicated in transcription regulation such as homeoboxes, acidic stretches, 
enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 



1 5 Throughout this application, various publications, patents and published patent applications 

are cited. The disclosures of these publications, patents and published patent specification 
referenced in this application are hereby incorporated by reference into the present disclosure to 
more fully describe the sate of the art to which this invention pertains. 

EXAMPLES 



20 EXAMPLE 1 : LOCALIZATION OF THE OLFACTORY RECEPTOR GENE OLF3 
AND OLF5 ON THE HUMAN CHROMOSOMES. 

Metaphase chromosome preparation 

Metaphase chromosomes were prepared from phytohemagglutinin (PHA)-stimulated blood 
cell donors. PHA stimulated lymphocytes from healthy males were cultured for 72 h in RPMI-1640 

25 medium. For synchronization, methotrexate (10 |*M) was added for 1 7 h, followed by addition of 5- 
bromodeoxyuridine (5-BrdU, 0.1 mM) for 6 h. Colcemid (1 mg/ml) was added for the last 15 min 
before harvesting the cells. Cells were collected, washed in RPMI, incubated with a hypotonic 
solution of KC1 (75 mM) at 37°C for 15 min and fixed in three changes of methanol:acid acetic 
(3:1). The cell suspension was dropped onto a glass slide, air-dried and kept in darkness at -20°C 

30 until use. 

Probes: 

- The BAC H0526H04 containing OlO and OlfS genes was used to generate probe by Alu- 
PCR. PCR amplification of BAC recombinant DNA (50 ng) was carried out as described by Romana 
etal.(1993). 
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- Two DNA fragments carrying respectively OlO and 01f5 sequences were generated by 
long range PCR with specific primers (SEQ ID 96-99) and used as probes to confirm the localization 
of each genes. OlD and 01f5 amplicons are respectively 2.8 kb and 3.2 kb fragments. 

Probes were labeled by nick translation with bio-16-dUTP (Boehringer Mannheim), and 
5 purified over a Sephadex G50 column. 

Fluorescence In Situ Hybridization 

To determine the chromosomal localization of both genes, the BAC probe was initially 
hybridized to human metaphase cells. When biotinylated PCR products of BAC DNA were used in 
hybridization experiment, 75 ng of probe was precipitated with 75 \ig of competitor DNA (human 

10 Cotl DNA, GIBCO-BRL) and resuspended in 10 ^1 of hybridization buffer (50% formamide, 2 X 
SSC, 10% dextran sulfate, 1 mg/ml sonicated herring DNA, pH 7). When long range PCR products 
of OlO or 01f5 genes were used as probe, 5 ng of biotinylated probe were mixed with 5 fig of 
human Cotl DNA. Prior to hybridization, the probe was denatured at 70°C for 10 min and 
preannealed at 37°C for 2 h. 

15 Slides were treated for 1 h at 37°C with Rnase A (100 ng/ml), rinsed three times in 2 X SSC 

and dehydrated in an ethanol serie. Chromosome preparations were denatured in 70% formamide, 2 
X SSC (pH 7), for 2 min at 70°C, then dehydrated at 4°C. The slides were treated with proteinase K 
(10 yg/ml in 20 mM Tris-HCl, 2 mM CaC12) at 37°C for 8-10 min and dehydrated. After 
preannealing, the hybridization mixture containing the probe was placed on the slide, covered with a 

20 coverslip, sealed with rubber cement and incubated overnight in a humid chamber at 37°C. After 
hybridization and post hybridization washes, the biotinylated probe was detected by avidin-FITC (5 
Hg/ml, Vector Laboratories) and amplified once with additional layers of biotinylated goat anti- 
avidin (5 ^ig/ml, Vector Laboratories) and avidin-FITC. For chromosomal localization, fluorescent 
R-Bands were obtained as described by Cherif et al. (1990). The slides were observed under a 

25 LEICA fluorescent microscope (DMRXA). Chromosomes were counterstained with propidium 
iodide and the fluorescent signal of the probe appeared as two symmetrical yellow-green spots on 
both chromatids of the fluorescent R-band chromosome. 

Localization 

A specific signal (a double yellow-green spot) was observed on band 1 Iql2-ql3 on at least 
30 on chromosome 1 1 in >80% of the metaphases with all the probes. 

EXAMPLE 2 : IDENTIFICATION OF BIALLELIC MARKERS: DNA 
EXTRACTION 

Donors were unrelated and healthy. They presented a sufficient diversity for being representative of a 
French heterogeneous population. The DNA from 100 individuals was extracted and tested for the 
35 detection of the biallelic markers. 
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30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. 
Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed by 
a lysis solution (50 ml final volume : 10 mM Tris pH7.6; 5 mM MgCl 2 ; 10 mM NaCl). The solution 
was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the residual red 
5 cells present in the supernatant, after resuspension of the pellet in the lysis solution. 

The pellet of white cells was lysed overnight at 42°C with 3.7 ml of lysis solution composed 

of: 

- 3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM) / NaCl 0.4 M 
-200 nl SDS 10% 

10 - 500 \il K-proteinase (2 mg K-proteinase in TE 1 0-2 / NaCl 0.4 M). 

For the extraction of proteins, 1 ml saturated NaCl (6M) (1/3.5 v/v) was added. After 
vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous 
supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was 
1 5 rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 2000 rpm. 
The pellet was dried at 37°C, and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA 
concentration was evaluated by measuring the OD at 260 nm (1 unit OD = 50 jig/ml DNA). 

To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio was 
determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1 .8 and 2 were used 
20 in the subsequent examples described below. 

The pool was constituted by mixing equivalent quantities of DNA from each individual. 

EXAMPLE 3 : IDENTIFICATION OF BIALLELIC MARKERS: AMPLIFICATION 
OF GENOMIC DNA BY PCR 

The amplification of specific genomic sequences of the DNA samples of example 2 was 
25 carried out on the pool of DNA obtained previously. In addition, 50 individual samples were 
similarly amplified. 

PCR assays were performed using the following protocol: 



Final volume 


25 ul 


DNA 


2ng/ul 


MgCl 2 


2mM 


dNTP (each) 


200 uM 


primer (each) 


2.9 ng/ul 


Ampli Taq Gold DNA polymerase 


0.05 unit/ul 


PCR buffer (lOx = 0.1M TrisHCl pH8.3 0.5M KC1 


lx 



35 Each pair of first primers was designed using the sequence information of the olfactory 

receptor gene cluster disclosed herein and the OSP software (Hillier & Green, 1991). This first pair 
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of primers was about 20 nucleotides in length and had the sequences disclosed in Table 1 in the 
columns labeled PU and RP. 



Table 1 



Ampiicon 


Position range of 
the ampiicon in 
SEQID1 


Primer 
name 
RP 


Position range of 

amplification 
primer in SEQ ID 
Nol 


Primer 

namp 
UAlJIC 

PU 


Complementary 
position range of 

amplification 
primer in SEQ ID 
Nol 


99-13670 


7362 


7824 


Bl 


7362 


7380 


CI 


7805 


7824 


99-13669 


8120 


8662 | 


B2 


8120 


8140 


C2 


8643 


8662 


99-13666 


14308 


14757 


B3 


14308 


14328 


C3 


14740 


14757 


99-13664 


19346 


19845 


B4 ! 


19346 


19366 


C4 


19826 


19845 ! 


99-13663 


20298 


20800 


B5 


20298 


20318 


C5 


20781 


20800 


99-13660 


76752 


77223 


B6 


76752 


76772 


C6 


77205 


77223 


99-13652 


90967 


91494 


B7 


90967 


90987 


C7 


91474 


91494 


99-13671 


133925 


134393 


B8 


133925 


133945 


C8 


134375 


134393 


99-13649 


139807 


140351 


B9 


139807 


139826 


C9 


140331 


140351 


99-13648 


140912 


141434 


B10 


140912 


140932 


C10 


141416 


141434 


99-13647 


143828 


144309 


BU 


143828 


143847 


CU 


144292 


144309 



5 Preferably, the primers contained a common oligonucleotide tail upstream of the specific 

bases targeted for amplification which was useful for sequencing. 

Primers PU contain the following additional PU 5' sequence : 
TGTAAAACGACGGCCAGT; primers RP contain the following RP 5> sequence : 
CAGGAAACAGCTATGACC. The primer containing the additional PU 5' sequence is listed in 
10 SEQ ID No 26. The primer containing the additional RP 5' sequence is listed in SEQ ID No 27. 

The synthesis of these primers was performed following the phosphoramidite method, on a 
GENSET UFPS 24.1 synthesizer. 

DNA amplification was performed on a Genius II thermocycler. After heating at 95°C for 10 
min, 40 cycles were performed. Each cycle comprised: 30 sec at 95°C, 54°C for 1 min, and 30 sec at 
15 72°C. For final elongation, 10 min at 72°C ended the amplification. The quantities of the 

amplification products obtained were determined on 96-well microtiter plates, using a fluorometer 
and Picogreen as intercalant agent (Molecular Probes). 

EXAMPLE 4 : IDENTIFICATION OF BIALLELIC MARKERS: SEQUENCING OF 
AMPLIFIED GENOMIC DNA AND IDENTIFICATION OF POLYMORPHISMS. 

20 The sequencing of the amplified DNA obtained in example 3 was carried out on ABI 377 

sequencers. The sequences of the amplification products were determined using automated dideoxy 
terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of 
the sequencing reactions were run on sequencing gels and the sequences were determined using gel 
image analysis (ABI Prism DNA Sequencing Analysis software (2.1 .2 version)). 
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The sequence data were further evaluated using the above mentioned polymorphism analysis 
software designed to detect the presence of biallelic markers among the pooled amplified fragments. 
The polymorphism search was based on the presence of superimposed peaks in the electrophoresis 
pattern resulting from different bases occurring at the same position as described previously. 
5 1 1 fragments of amplification were analyzed. In these segments, 13 biallelic markers 

referred to as Al to A13 in the BM column were detected. The localization of these biallelic markers 



is as shown in Table 2. 

Table 2 



Amplicon 


BM 


Marker Name 


Localization in OLF 
gene cluster 


Polymor- 
phism 


BM position in 
SEQ ID No 1 


99-13670 


Al 


99-13670-305 


Between Orfl and Orf2 


A/C 


7521 


I 99-13669 


A2 


99-13669^71 


Between Orfl and Orf2 


A/C 


8192 


99-13666 


A3 


99-13666-275 


Between Orf2 and OrD 


ATT 


14483 


99-13664 


A4 


99-13664-221 


Between Orf2 and OrD 


A/G 


19625 


99-13663 


A5 


99-13663-218 


Between Orf2 and OrD 


C/T 


20583 


99-13660 


A6 


99-13660-277 


Between Orf4 and Orf5 


G/T 


76947 


99-13652 


A7 


99-13652-407 


Between OrD and Orf6 


G/C 


91088 


99-13652 


A8 


99-13652-357 


Between Orf5 and Orf6 


C/T 


91138 


99-13652 


A9 


99-13652-308 


Between OrD and Orf6 


C/T 


91187 


99-13671 


A10 


99-13671-396 


Between Orf9 and 
OrflO 


C/T 


133998 I 


99-13649 


All 


99-13649-286 


Between Orf9 and 
OrflO 


A/G 


140066 i 


99-13648 


A12 


99-13648-259 


Between Orf9 and 
OrflO 


C/T 


141176 


99-13647 


A13 


99-13647-278 


After OrflO 


C/T 


144033 



10 Table 3 



BM 


Marker Name 


Position range of 
probes in SEQ ID 
Nol 


Probes 


Al 


99-13670-305 


7498 


7544 


PI 


A2 


99-13669-471 


8169 


8215 


P2 


A3 


99-13666-275 


14460 


14506 


P3 


A4 


99-13664-221 


19602 


19648 


P4 


A5 


99-13663-218 


20560 


20606 


P5 


A6 


99-13660-277 


76924 


76970 


P6 


A7 


99-13652^07 


91065 


91111 


P7 


A8 


99-13652-357 


91115 


91161 


P8 


A9 


99-13652-308 


91164 


91210 


P9 


A10 


99-13671-396 


133975 


134021 


P10 


All 


99-13649-286 


140043 


140089 


Pll 


A12 


99-13648-259 


141153 


141199 


P12 


A13 


99-13647-278 


144010 


144056 


P13 
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EXAMPLE 5 : VALIDATION OF THE POLYMORPHISMS THROUGH 
MICROSEQUENCING 

The biallelic markers identified in example 4 were further confirmed and their respective 
frequencies were determined through microsequencing. Microsequencing was carried out for each 
5 individual DNA sample described in Example 2. 

Amplification from genomic DNA of individuals was performed by PCR as described above 
for the detection of the biallelic markers with the same set of PCR primers (Table 1). 

The preferred primers used in microsequencing were about 1 9 nucleotides in length and 
hybridized just upstream of the considered polymorphic base. According to the invention, the 
10 primers used in microsequencing are detailed in Table 4. 

Table 4 



Marker Name 


BM 


Mis. 1 


Position range of 
microsequencing 
primer mis 1 in 
SEQ ID No 1 


Mis. 2 


Complementary position 

range of 
microsequencing primer 
mis. 2 in SEQ ID No 1 


99-13670-305 


Al 


1 Dl 


7502 


7520 


El 


7522 


7540 


99-13669-471 


A2 


D2 


8173 


8191 


E2 


8193 


L 8211 


99-13666-275 


A3 


D3 


14464 


14482 


E3 


14484 


14502 


99-13664-221 


A4 


D4 


19606 


19624 


E4 


19626 


19644 


99-13663-218 


A5 


D5 


20564 


20582 


E5 


20584 


20602 


99-13660-277 


A6 


D6 


76928 


76946 


E6 


76948 


76966 


99-13652-407 


A7 


D7 


91069 


91087 


E7 


91089 


91107 


99-13652-357 


A8 


D8 


91119 


91137 


E8 


91139 


91157 


99-13652-308 


A9 


D9 


91168 


91186 


E9 


91188 


91206 ! 


99-13671-396 


A10 


D10 


133979 


133997 


E10 


133999 


134017 


99-13649-286 


All 


Dll 


140047 


140065 


Ell 


140067 


140085 


99-13648-259 


A12 


D12 


141157 


141175 


E12 


141177 


141195 


99-13647-278 


A13 


D13 


144014 


144032 


E13 


144034 


144052 



Mis 1 and Mis 2 respectively refer to microsequencing primers which hybridized with the 
non-coding strand of the olfactory receptor gene or with the coding strand of the olfactory receptor 
15 gene. 

The microsequencing reaction was performed as follows : 

After purification of the amplification products, the microsequencing reaction mixture was 
prepared by adding, in a 20\i\ final volume: 10 pmol microsequencing oligonucleotide, 1 U 
Thermosequenase (Amersham E79000G), 1.25 ^1 Thermosequenase buffer (260 mM Tris HCI pH 

20 9.5, 65 mM MgCl 2 ), and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye Terminator Set 
401095) complementary to the nucleotides at the polymorphic site of each biallelic marker tested, 
following the manufacturer's recommendations. After 4 minutes at 94°C, 20 PCR cycles of 15 sec at 
55°C, 5 sec at 72°C, and 10 sec at 94°C were carried out in a Tetrad PTC-225 thermocycler (MJ 
Research). The unincorporated dye terminators were then removed by ethanol precipitation. Samples 

25 were finally resuspended in formamide-EDTA loading buffer and heated for 2 min at 95°C before 
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being loaded on a polyacrylamide sequencing gel The data were collected by an ABI PRISM 377 
DNA sequencer and processed using the GENESCAN software (Perkin Elmer). 

Following gel analysis, data were automatically processed with software that allows the 
determination of the alleles of biallelic markers present in each amplified fragment. 
5 The software evaluates such factors as whether the intensities of the signals resulting from 

the above microsequencing procedures are weak, normal, or saturated, or whether the signals are 
ambiguous. In addition, the software identifies significant peaks (according to shape and height 
criteria). Among the significant peaks, peaks corresponding to the targeted site are identified based 
on their position. When two significant peaks are detected for the same position, each sample is 
10 categorized classification as homozygous or heterozygous type based on the height ratio. 

While the preferred embodiment of the invention has been illustrated and described, it will 
be appreciated that various changes can be made therein by the one skilled in the art without 
departing from the spirit and scope of the invention. 
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SEQUENCE LISTING FREE TEXT 

The following free text appears in the accompanying Sequence Listing : 
open reading frame 
ubiquitin 1 pseudogene complement 
5 ubiquitin 2 pseudogene complement 

polymorphic base 
or 

complement 
probe 

1 0 sequencing oligonucleotide PrimerPU 

sequencing oligonucleotide PrimerRP 
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What is claimed: 

1 . An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1 of the following nucleotide positions of SEQ ID No 1: 1-113643, 114064- 

5 127488,127855-144460. 

2. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of a sequence selected from the group consisting of SEQ ID Nos 2-1 1 or the 
complements thereof. 

10 

3. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of 8 to 50 nucleotides of SEQ ID No 1 or the complement thereof, wherein said 
span includes an olfactory receptor-related biallelic marker in said sequence. 

15 4. A polynucleotide according to claim 3, wherein said olfactory receptor-related biallelic 

marker is selected from the group consisting of Al to A13, and the complements thereof. 

5. A polynucleotide according to claims 3 or 4, wherein said contiguous span is 18 to 47 
nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 

20 polynucleotide. 

6. A polynucleotide according to claim 5, wherein said polynucleotide consists essentially 
of a sequence selected from the following sequences: PI to P13, and the complementary sequences 
thereto. 

25 

7. A polynucleotide according to any one of claims 1, 2 or 3, wherein the 3' end of said 
contiguous span is present at the 3* end of said polynucleotide. 

8. A polynucleotide according to claims 3 or 4, wherein the 3 ! end of said contiguous span is 
30 located at the 3* end of said polynucleotide and said biallelic marker is present at the 3' end of said 

polynucleotide. 

9. An isolated, purified, or recombinant polynucleotide consisting essentially of a contiguous 
span of 8 to 50 nucleotides of SEQ ID No 1 or the complement thereof, wherein the 3* end of said 

35 contiguous span is located at the 3 1 end of said polynucleotide, and wherein the 3' end of said 



WO 00/21985 PCT/IB99/01729 

89 

polynucleotide is located within 20 nucleotides upstream of an olfactory receptor-related biallelic 
marker in said sequence. 

10. A polynucleotide according to claim 9, wherein the 3 f end of said polynucleotide is 

5 located 1 nucleotide upstream of said olfactory receptor-related biallelic marker in said sequence. 

1 1 . A polynucleotide according to claim 10, wherein said polynucleotide consists 
essentially of a sequence selected from the following sequences: Dl to D13, and El to E13. 

10 12. A polynucleotide according to claim 7 consisting essentially of a sequence selected from 

the following sequences: Bl to Bl 1 and CI to CI 1. 

13. An isolated, purified, or recombinant polynucleotide which encodes a polypeptide 
comprising a contiguous span of at least 6 amino acids of a sequence selected from the group 

1 5 consisting of SEQ ID Nos 12-21. 

14. A polynucleotide for use in a genotyping assay for determining the identity of the 
nucleotide at an olfactory receptor-related biallelic marker or the complement thereof. 

20 15. A polynucleotide according to claim 14, wherein the polynucleotide is used in an assay 

selected from the group consisting of: a hybridization assay, a sequencing assay, an enzyme-based 
mismatch detection assay, and an amplification of a segment of nucleotides comprising said biallelic 
marker. 

25 16. A polynucleotide according to any one of claims 1-15 attached to a solid support. 

17. An array of polynucleotides comprising at least one polynucleotide according to claim 

16. 

30 18. An array according to claim 17, wherein said array is addressable. 

19. A polynucleotide according to any one of claims 1-1 5, further comprising a label. 



35 



20. 



21. 



A recombinant vector comprising a polynucleotide according to any one of claims 1-15. 
A host cell comprising a recombinant vector according to claim 20. 
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22. A non-human host animal or mammal comprising a recombinant vector according to 
claim 20. 

23. A mammalian host cell comprising an olfactory receptor gene disrupted by homologous 
5 recombination with a knock out vector, comprising a polynucleotide according to any one of claims 

1-15. 

24. A non-human host mammal comprising an olfactory receptor gene disrupted by 
homologous recombination with a knock out vector, comprising a polynucleotide according to any 

10 one of claims 1-15. 

25. An isolated, purified, or recombinant polypeptide comprising a contiguous span of at 
least 6 amino acids of a sequence selected from the group consisting of SEQ ID Nos 12-21. 

15 26. An isolated or purified antibody composition are capable of selectively binding to an 

epitope-containing fragment of a polypeptide according to claim 25. 

27. A method of genotyping comprising determining the identity of a nucleotide at an 
olfactory receptor-related biallelic marker or the complement thereof in a biological sample. 

20 

28. A method according to claim 27, wherein said biological sample is derived from a 
single subject. 

29. A method according to claim 28, wherein the identity of the nucleotides at said biallelic 
25 marker is determined for both copies of said biallelic marker present in said individual's genome. 

30. A method according to claim 27, wherein said biological sample is derived from 
multiple subjects. 

30 3 1 . A method according to claim 27, further comprising amplifying a portion of said 

sequence comprising the biallelic marker prior to said determining step. 

32. A method according to claim 31, wherein said amplifying step is performed by PCR. 

35 33. A method according to claim 27, wherein said determining is performed by an assay 

selected from the group consisting of: a hybridization assay, a sequencing assay, a microsequencing 
assay, and an enzyme-based mismatch detection assay. 
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34. A method according to claim 27 wherein said olfactory receptor-related biallelic marker 
is selected from the group consisting of Al to A13 and the complements thereof. 

5 35. A method for the screening of a candidate substance interacting with an olfactory 

receptor polypeptide selected from the group consisting of SEQ ID Nos 12-21, or fragments or 
variants thereof, comprises the following steps : 

a) providing a polypeptide selected from the group consisting of the sequences of SEQ ID 
Nos 12-21 , or a peptide fragment or a variant thereof; 
10 b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; and 

d) detecting the complexes formed between said polypeptide and said candidate substance. 

36. A method for the screening of ligand molecules interacting with an olfactory receptor 
1 5 polypeptide selected from the group consisting of SEQ ID Nos 12-21, wherein said method 
comprises : 

a) providing a recombinant eukaryotic host cell containing a nucleic acid encoding a 
polypeptide selected from the group consisting of the polypeptides comprising the amino acid 
sequences SEQ ID Nos 12-21; 
20 b) preparing membrane extracts of said recombinant eukaryotic host cell; 

c) bringing into contact the membrane extracts prepared at step b) with a selected ligand 
molecule; and 

d) detecting the production level of second messengers metabolites. 

25 37. A method for the screening of ligand molecules interacting with an olfactory receptor 

polypeptide selected from the group consisting of SEQ ID Nos 12-21, wherein said method 
comprises : 

a) providing an adenovirus containing a nucleic acid encoding a polypeptide selected from 
the group consisting of the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21; 
30 b) infecting an olfactory epithelium with said adenovirus; 

c) bringing into contact the olfactory epithelium b) with a selected ligand molecule; and 

d) detecting the increase of the response to said ligand molecule. 
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FIGURE 1 (continued ) 
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<120> Genes encoding olfactory receptors and biallelic markers thereof. 

<150> US 60/104,299 
<151> 1999-10-13 

<160> 27 

<170> Patent. pm 



<210> 1 

<211> 144460 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> 2406. .2600 

<223> open reading frame 1 

<220> 

<221> CDS 

<222> 9711. .10658 

<223> open reading frame 2 

<220> 
<221> CDS 

<222> 24851. .25369 

<223> open reading frame 3 

<220> 
<221> CDS 

<222> 45714. .46661 

<223> open reading frame 4 

<220> 
<221> CDS 

<222> 80198. .81115 

<223> open reading frame 5 

<220> 
<221> CDS 

<222> 96291. .96902 

<223> open reading frame 6 

<220> 
<221> CDS 

<222> 110758. ,111564 
<223> open reading frame 7 

<220> 
<221> CDS 

<222> 122525. .122887 
<223> open reading frame 8 

<220> 
<221> CDS 

<222> 132454. .133389 
<223> open reading frame 9 



<220> 
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<221> CDS 

<222> 143398. .143577 

<223> open reading frame 10 

<220> 

<221> misc_f eature 
<222> 113644. .114063 

<223> ubiquitin 1 pseudogene complement 
<220> 

<221> misc_feature 
<222> 127489. .127854 

<223> ubiquitin 2 pseudogene complement 
<220> 

<221> allele 
<222> 7521 

<223> 99-13670-305 : polymorphic base G or T 
<220> 

<221> allele 
<222> 8192 

<223> 99-13669-471 : polymorphic base G or T 
<220> 

<221> allele 
<222> 14483 

<223> 99-13666-275 : polymorphic base A or T 
<220> 

<221> allele 
<222> 19625 

<223> 99-13664-221 : polymorphic base C or T 
<220> 

<221> allele 
<222> 20583 

<223> 99-13663-218 : polymorphic base A or G 
<220> 

<221> allele 
<222> 76947 

<223> 99-13660-277 : polymorphic base A or C 
<220> 

<22l> allele 
<222> 91088 

<223> 99-13652-407 : polymorphic base G or C 
<220> 

<221> allele 
<222> 91138 

<223> 99-13652-357 : polymorphic base A or G 
<220> 

<221> allele 
<222> 91187 

<223> 99-13652-308 : polymorphic base A or G 
<220> 

<221> allele 
<222> 133998 

<223> 99-13671-396 : polymorphic base A or G 
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<220> 

<221> allele 

<222> 140066 

<223> 99-13649-286 : 



polymorphic base C or T 



<220> 

<221> allele 

<222> 141176 

<223> 99-13648-259 : polymorphic base A or G 
<220> 

<221> allele 

<222> 144033 

<223> 99-13647-278 : polymorphic base A or G 



<220> 

<221> primer_bind 
<222> 7362. .7380 
<223> 99-13670. rp 

<220> 

<221> primer_bind 

<222> 7805. .7824 

<223> 99-13670. pu complement 



<220> 

<221> primer_bind 
<222> 8120. .8140 
<223> 99-13669. rp 



<220> 

<221> primer_bind 

<222> 8643. .8662 

<223> 99-13669. pu complement 



<220> 

<221> primerjbind 
<222> 14308. .14328 
<223> 99-13666. rp 



<220> 

<221> primerjbind 

<222> 14740. .14757 

<223> 99-13666. pu complement 



<220> 

<221> prime r_bind 

<222> 19346. .19366 

<223> 99-13664. rp 



<220> 

<221> primer_bind 

<222> 19826. .19845 

<223> 99-13664. pu complement 



<220> 

<221> primerjbind 

<222> 20298. .20318 

<223> 99-13663. rp 

<220> 

<221> primerjbind 
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<222> 20781. .20800 

<223> 99-13663. pu complement 

<220> 

<221> primer_bind 
<222> 76752. .76772 
<223> 99-13660. rp 

<220> 

<221> primer_bind 

<222> 77205. .77223 

<223> 99-13660. pu complement 

<220> 

<221> primer_bind 
<222> 90967. .90987 
<223> 99-13652. rp 

<220> 

<221> primer_bind 

<222> 91474. .91494 

<223> 99-13652. pu complement 

<220> 

<221> primer_bind 
<222> 133925. .133945 
<223> 99-13671. rp 

<220> 

<221> primer_bind 

<222> 134375. .134393 

<223> 99-13671. pu complement 

<220> 

<221> primer_bind 
<222> 139807. .139826 
<223> 99-13649. rp 

<220> 

<221> primer_bind 

<222> 140331. .140351 

<223> 99 -13 64 9. pu complement 

<220> 

<221> primer_bind 
<222> 140912. .140932 
<223> 99-13648. rp 

<220> 

<221> prime r_bind 

<222> 141416. .141434 

<223> 99-13648. pu complement 

<220> 

<221> primer_bind 
<222> 143828. .143847 
<223> 99-13647. rp 

<220> 

<221> primer_bind 

<222> 144292. .144309 

<223> 99-13647. pu complement 
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<220> 

<221> misc_binding 

<222> 7498. .7544 

<223> 99-13670-305. probe 

<220> 

<221> misc_binding 

<222> 8169. .8215 

<223> 99-13669-471. probe 

<220> 

<221> misc_binding 
<222> 14460. .14506 
<223> 99-13666-275. probe 

<220> 

<221> misc_binding 
<222> 19602. .19648 
<223> 99-13664-221. probe 

<220> 

<221> misc_binding 
<222> 20560. .20606 
<223> 99-13663-218. probe 

<220> 

<221> misc_binding 
<222> 76924. .76970 
<223> 99-13660-277. probe 

<220> 

<221> misc_binding 
<222> 91065. .91111 
<223> 99-13652-407. probe 

<220> 

<221> misc_binding 
<222> 91115. .91161 
<223> 99-13652-357. probe 

<220> 

<221> misc_binding 
<222> 91164. .91210 
<223> 99-13652-308. probe 

<220> 

<221> misc_binding 
<222> 133975. .134021 
<223> 99-13671-396. probe 

<220> 

<221> misc_binding 
<222> 140043. .140089 
<223> 99-13649-286. probe 

<220> 

<221> misc_binding 
<222> 141153. .141199 
<223> 99-13648-259. probe 

<220> 

<221> misc_binding 
<222> 144010. .144056 
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<223> 99-13647-278. probe 
<220> 

<22l> primer_bind 
<222> 7502. .7520 
<223> 99-13670-305. mis 

<220> 

<221> primer_bind 
<222> 7522. .7540 

<223> 99-13670-305. mis complement 
<220> 

<221> prime r_bind 
<222> 8173. .8191 
<223> 99-13669-471. mis 

<220> 

<221> primer_bind 
<222> 8193.. 8211 

<223> 99-13669-471. mis complement 
<220> 

<221> primerjbind 
<222> 14464. .14482 
<223> 99-13666-275. mis 

<220> 

<221> primer_bind 

<222> 14484. .14502 

<223> 99-13666-275 .mis complement 

<220> 

<221> primer_bind 
<222> 19606. .19624 
<223> 99-13664-221. mis 

<220> 

<221> primer_bind 

<222> 19626. .19644 

<223> 99-13664-221. mis complement 

<220> 

<221> primer_bind 
<222> 20564. .20562 
<223> 99-13663-218. mis 

<220> 

<221> primer_bind 

<222> 20584. .20602 

<223> 99-13663-218 .mis complement 

<220> 

<221> primerjbind 
<222> 76928. .76946 
<223> 99-13660-277. mis 

<220> 

<221> primerjbind 

<222> 76948. .76966 

<223> 99-13660-277 .mis complement 

<220> 
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<221> primer_bind 
<222> 91069. .91087 
<223> 99-13652-407 .mis 

<220> 

<221> primer_bind 

<222> 91089. .91107 

<223> 99-13652-407 .mis complement 

<220> 

<221> primer__bind 
<222> 91119. .91137 
<223> 99-13652-357. mis 

<220> 

<221> primer_bind 

<222> 91139. .91157 

<223> 99-13652-357 .mis complement 

<220> 

<221> prime r_bind 
<222> 91168. .91186 
<223> 99-13652-308. mis 

<220> 

<221> prime r_bind 

<222> 91188. .91206 

<223> 99-13652-308 .mis complement 

<220> 

<221> primer__bind 
<222> 133979. .133997 
<223> 99-13671-396. mis 

<220> 

<221> primer_bind 

<222> 133999. .134017 

<223> 99-13671-396. mis complement 

<220> 

<221> primer_bind 
<222> 140047. .140065 
<223> 99-13649-286. mis 

<220> 

<221> prime r_bind 

<222> 140067. .140085 

<223> 99-13649-286. mis complement 

<220> 

<221> primer_bind 

<222> 141157. .141175 

<223> 99-13648-259. mis 

<220> 

<221> primer_bind 

<222> 141177. .141195 

<223> 99-13648-259 .mis complement 

<220> 

<221> primer_bind 
<222> 144014. .144032 
<223> 99-13647-278. mis 
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<220> 

<221> primer_bind 

<222> 144034. .144052 

<223> 99-13647-278 .mis complement 

<400> 1 

caattaaagt tttgttcact ataagtcttt tttggaaaag agagagaaac attcaaatta 60 

tttacatacc agtttccatt agcatgtgaa gaacaaacag aaacactttt cagggtgaac 120 

aaaattcctg ctacagttat aaaatcctgc atatactctt tactttgtga ttctgaaaaa 180 

caccgttcta cctggtttat tgaaatgtgt gaaagctcta atgcaatgtt attttttaca 240 

ttttgtaaca cttaagtcat aaagccaagc tattctcaaa ccttgatgaa acatgttgga 300 

agaaattatg ttttagtgtt tggtgaaaac attatgtttc gtcacttaag gtgataaatt 360 

gtactcatta aagaactttg aaagttcaca catagccaat ggtttaaaat gcactaattt 420 

agattcccaa ttctcacaaa ggccagttac tctggaccat tcaatcgcca aagaggaaaa 480 

ctgggggcat tcccatcccg ggatatggga agtccccgag cttccagcct ggtccttgtg 540 

gccgaaaaat ggcatgtttt gcttttgctt ttggatcctg ttcgtgccgc caaaaatgtt 600 

ctgtgtgggg aaagtgcgag gggagagaaa agacacggac atgatacgtt taagggtaaa 660 

caacgtttat cccatgtaag tggccatgca gatatagtaa gcaaatgata taataataag 720 

caaatgatat aataagcaga ttgatataat aagtagattg caatggaacg gggaaaaggg 780 

aaaatacatc tacattcacc agactatgga ggattcaaca acagactggg acgcaacagc 840 

ctgggctcca gagtcagata ggtaggcaaa gagatcctag ttctatacag atacgtacca 900 

tggagcagtt ccactttcct aagcacattc agttgtgata aaaatagatg agtttcaagg 960 

gctgatacat tacatgccac actcaaagtt gtgttgttaa acaatttcaa ttgttgttac 1020 

aatttcaaat aaaagcaatg tttacaacca tgggttcaag agaagtctaa gtgaacacat 10 80 

ataataaaga cttgcaaaat aataaaagat aaggctcttt aactatcaaa agacttgcag 1140 

aaaagaacca cagaaaacca ttttaaatat aactgccttc gtatgtaaga aattctacat 12 00 

tatttttgat gttaaaacat caatctcatg cttactaggc tatttcttaa tgacacatgt 1260 

atttacaaat ttgagagaag aggaagaaat atcaggtgac accactgggt taatgcataa 1320 

atgacaaacc taaatgcatt ttaatttcct tttctttaaa tcgagctgag cttcagcccc 1380 

ttctttttgt ggtgttctta gtcatctacc ttatcacagt aatcacaatg taagcatgat 1440 

cttctttttt ttttttaagt gcacaatatt tttaactgtt aacaatatac ctattgttac 1500 

ctatgggcac aatgatatac agcatatctc tagaatttat tcttgcaaaa ctataacttt 1560 

atacctgctg aacagcaaca ccccatttct ccctttcctc cagccgctgc aaccaccttc 1620 

tattctctgt ttctatgagt ttgactattt tggattcctc atataaattt aatcatgcag 1680 

tatttgtcct tccgtgcctg gcttatttca cttaacataa tgtcctccag gttcatcata 1740 

tgacaggatt tcttcttttt cttaatgatg aataatattc cattacatgt gtgtactaca 1800 

ttttcttcat ctttcaatgg acatttaggt tgtttctata tctggactat tgtaaataat 1860 

ggtgcaatga acataagagt acctatgtct cttcaagagc ttgatttaaa ttcttttgga 1920 

tatatgccca gaagtgcaat tgctggttta tatgataatt cgatttttaa ttatttgaag 1980 

actcatcata ctgtttttta tagtggctgc acaattttat attcccacca atgttgtaca 2040 

agggttccaa tttcttcata tgtcaccaat atttgttgtc ttttggattt ttttaaaata 2100 

aagtaacagc catcataaca aatgtgatat catgcttttg tttcatatgc attttcctga 2160 

tgattagtgt gttgagcacc ttttcatttt tatttattta tttatttata ctctaagttc 2220 

tgggatacat ctgcagaaca cgcaggtttg ttacataggt atacatgtgc catggtggtt 2280 

tgctgcgccc attaacctgt catctacatt aggtatttct gctaatacta tccctccccc 2340 

agcccccgac cccctgacag gtcctggtgt gtgatgtttc ccttcctgtg tccatgtgtg 2400 

tgagcatgag cttcttaata agaagtgatt caacactaca cactccaatg tgcttgttcc 2460 

tcagtcatct ctcctttgta gatctctatt atgccaccaa tgccactcct ccgatgctgg 2520 

ttaacttttt ttttccaaga gaaaaaccgt ttcctttatt ggttgcttta tccaatttca 2580 

ccttttcatt gcactggtga tcacagatta tcatatgctc acagtgatgg tgtatgacca 2640 

ctacatggcc atctgcaagc ctttgttata tggaagcaaa atgtccaggt gtgtctgcct 2700 

ctgtctcact gctgctccct atatttatgg ctctgcaaat ggtctggtac aggtcatcct 2760 

gatgctttgt ctgttcttct gtgaacccaa tgagatcaac cacttttttt ttttttggag 2820 

aaaatgcatt atatgcacat ttaattccac tataaatttt tgaatggacg gttggagagg 2880 

aagggagaaa tacatattaa cggagagaat accacccaga aagtatatac aatgggagaa 2940 

aggaacctgt tgatccaagt ttccatattc ttattatggc atataaggtc atgattattt 3000 

tctcagtatg aagcatctcc cagggctgac tctgatgtaa aattggagat caaccacttt 3060 

tattatgcag aaccacccct cttagtcctc gcctgcttgg atacttatgt caaagaaact 3120 

gccatgttca tggtggctgg ttccaacctc atctgccctc tcactatcat ctttatttcc 3180 

tacactttca tcttcacaga cattctgcat atctgcactg ctgagggaag gtacaatgcc 3240 

ttctccacct gcgggtccct tgtgactgcc gtcactgtct ttcaaggaac gctgtttcac 3300 

atgtgcctga ggcccccttc tgaggcatct gtagaacagg ggaaaattgt agctgctttt 3360 
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tatatctttg tgagtcctac gttaaaccca 
aaaagaacaa taagggaagt tatccaaaag 
agttgcaggt tatgtaatac attattttta 
caaatcactt tctgtcattg agtgtttttt 
agtgtatacc aaattattag ctagagttga 
tcatagaaat ttaaatagaa gacttatggc 
ttctttagta ctcatattgg tagcaaacga 
tcattacatt tttatagtac ttaataactt 
tagtatatga agttttctcc tttcttttat 
gtagggaaaa gagagatcag actgttactg 
cttcattttg acttgtaccc tgaacaattg 
tttgccccaa ccttgagctt ataaaaacat 
tagggctgtg caggatgtgc cttgttagca 
gtcatcgcca ttctccagtc tccataaacc 
ggacctctgc cctggaaagc cgggtattgc 
tatggcctcg tgggatggga aagacctgac 
tgtgctcggg aggattagta aaagaggatg 
tctgtctcct gcctgcccct gggaactgaa 
cttctattct gagataggag aaaaaccgcc 
aatgctgctt tgttattctt tactccactg 
ggcttacgtg cacatccagg catagtacct 
tttgctcaca tgttttcttg ctgaccttct, 
tcctcttgct gagatagtga aaatagtaat 
tgccgatgca ggtcttccat atgctgagcg 
atactttgtc tctgtgtctt atttcttttc 
cccacaggtg cagaggggca ggccacccct 
tgagatacag tcttagaatt atattttgag 
aagcaatgcc aaatttccgt tagtaggctt 
gcattcacat caatatgtga cctcacttct 
tttcaagtat gtcttaatat attgattttt 
cacatgtggg actacaccgt aatttgggta 
gtgtgggcct ccatttctct agaacgattt 
cacatgctta ggcagatgaa tcactgcagc 
agatagctct tcagtaggat ggtgtgaatt 
ctaccttttg acccagcaat cccattactg 
caccataaag acacatgcat gtgtatgttc 
tggaatcaac ctaaatgccc ttcaacagta 
catcatggaa tgctatgcag ccgtaaaaaa 
acggatgaag ctggaggaca atatccaaag 
ctgcaagttc tcacttacat gtggaagcta 
gaacaacagg caccaggacc tacttgaggg 
ctctgcctat caggcactgt gcttatcaac 
cctgtgacat ggaatttacc tttataacaa 
aaagttaaaa aaagaaatct gtcccaagga 
cttaatgaac tatggatttc atgcattctt 
gacagtttga tgcattttac atagtatcag 
gcttcaaaat aatagtaaat gggtagaatt 
aataatacaa catcagtgat gtagtgtcta 
taggctggat tttgataata atagcatgct 
taatcacaaa caagtaaaaa tctaaagggc 
gtatcatttg tgtagctaaa tccattcgct 
tgcagtggaa gaaaaatgaa atgaactaag 
gccttctgga ggttttctat gaaaaataac 
agtatcatac acatagctaa attctgtatt 
attgccttct ttcgtgtgga catctgatca 
ttgctctttc taaatgaaaa tagcccattc 
ttatagggct agttttattt ttgtcagaaa 
gtagccattg tataccaaca tataaaagaa 
ggaatttttt cttcaaagtg aaagcagtct 
ttatgcaaaa caaggtacct ctacattaag 
aatcttatga taccttatat tccatcttaa 
tcacccaggc tgaggtgcag tggtgtaatt 
tcaagcaatt ctccctgcct cagcctcctg 



9 

ttgatctacc gtctgaggaa taaaaatgtt 3420 

aaactgtttg ctaagtaagg tagatatttt 3480 

tcttaccaat taacgagcat tataaattaa 3540 

gtcttttgta acttgcatat gggaattgaa 3600 

cagtgtcatc tcagtgaatt taagaagaaa 3660 

atgtaaaagt caataaagaa cagtgattcc 3720 

taaaagacag aatgcaatgg aaattacagt 3780 

ccaaactatt ttctagacac ctttcaaaca 3840 

acagataatg caacaataaa gatcactgat 3900 

tgtctatgta gaaaaggaag gcataagaaa 3960 

ttttgtcctg agatgctgtt aatctgtaac 4020 

gtgttgtatg gaatcaaggt ttaagggatc 4080 

gaatgtatac aggcagtatg cttggtaaaa 4140 

a 9999 cacaa tgcactgtgg aaagtcacag 4200 

caaggtttct ccccatgtga tagtctgaaa 4260 

cggcccccag cctgacaccc gtgaagggtc 4320 

gcctcttata gctgagataa gaggaaggcc 4380 

tgtctcggta taaaacctga ttgcaccttt 4440 

ctgtggcggg aggcgagaca tgttggcagc 4500 

agatgtttgg gtggagagaa ggaaaaatct 4560 

ccccttgaac ttatttgtga cacagattcc 4620 

ccctattatc accctgttct cctaccacat 4680 

caataaaaac tgagggatct cagagaccgg 4740 

ccagtcccct ggggccactg ttctttctct 4800 

tcagtctctc atcccacctg acgagatata 4 860 

tcaattgaag tatatctcag aatactactt 4 920 

ccaatgaaat cttctttctt gaagcttttg 4980 

tataaatatc attgtttgca ttaccaggag 5040 

ccactctttc attgccattg aagcagatac 5100 

atcttctcat tgggggaaca tgggaagtgt 5160 

tttgtagtct taaggttttc atgaagcttc 5220 

gatgtgttcg ttttttatcc ttcacagcaa 5280 

agcatttaga cacatttgtg attcagggat 5340 

ttgggataat ggcacatact taaaacagaa 5400 

ggtatataca ccaaggaata taaatcattc 5460 

atcacaacag tattcacaat agcaaagaca 5520 

gattggatac aaaaaaatgt ggtatatata 5580 

aaagaatgag attatgtctt ttgtagcaac 5640 

caaaccaatg caggaacagg aaaccaaatg 5700 

aacattgaat acacatggac acaaaaaagg 5760 

tgaagtgtgg gaggagggta aggattgaaa 5820 

tgggtgatga aataacctgt acaccaaaac 5880 

agctgcacat ggacccctga acctaaaata 5940 

gactgttttc tcttaatgtg ctgcatcctg 6000 

tttcaagatt atattgccta cctgattgta 6060 

ttaaacatta aacataatta agagcatttg 6120 

tattatggtt atagtactac tcatacaaat 6180 

gtgagcatga cactattata gaacacttct 6240 

ataacttttg aataaaaata gtaaattgaa 6300 

cagtagtatg tattcaacta gcttaccagt 6360 

gtccctctag cagacacaca tgctagttat 6420 

gaattaatgt ctttgagtaa tataaacaga 6480 

ataagtatgt gtaaaactct tctttgagta 6540 

ttctattcat tgctgtataa ataaattaac 6600 

tgtgattcca tacttaaaat tacatatact 6660 

atctatactt ttcatctaac cattcagtta 6720 

attcactttg taaatcttat gttttattat 6780 

aatacataca tacatacaca catacattag 6840 

acttaaaaat tatccaaatc ttcaacattt 6900 

ggaggaggaa atgagggaca ccagttttat 6960 

tttttttttt tgagatgaag tctcgctctg 7020 

ttggctcatt gcaacctctg cctcctgggt 7080 

agtagctggg attacaggca cgcaccacca 7140 
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aacctggcta agttttgtat ttttagtaga gacagggttt cgccatgttg gccaggctgg 7200 

cctcaaactc ttgacctcag gtgatctagc ggccttggcc tcccaaagtg ctgggattat 7260 

aggcaagagc caccgtgccc ggcttccaaa atatttaagc aatattattg cattttacaa 7320 

ttttagtaat gcaaggaacc aaaataaaac agaataattg agatagagaa gactttacac 7380 

atcaatttga aataaacata taggaatgcc ctatattctc aaatttcatg ggatgtaaca 744 0 

aatctacatt ttgattttga tatatagcta tattttatta aatgatttct cagagtataa 7500 

cccaccaccc gcaactctaa kaaaattagg gatgattctc cgtcttggtc agactgtact 7560 

ttgatccatt tgtgctaacc tggaactata tgtgcactgg aagatacaga ctaatgaacg 7620 

catctctagg tccctttgtc ctccaacaaa tacagtgcta cacaattttc aaatattctc 7680 

attctatttg caattccctt tctaaatcaa caattttatg catcatcaat tttataaata 7740 

gccctggttc tgagaccttt gatgattagc atgttaatat taaccttgat agtgcagact 7800 

agttccaaaa taaatccata acgcccgtcc tccaaaggat cctgggccct gaccaatact 7860 

tctgcctcct tctactgatt atgccactct tctgctctca cagtctttga aatctttcat 7920 

cttctcaaat gtctcacctg ccacaacttc ctacccctca ctcattagat gacctcactg 7980 

cagtgtttat aggcaaaagc tgaagagttc tgatgagaat tccttgtttt catcatacta 8040 

aaaaggcaga agttatctac ccatgttttc ctcttccatg ttattgaaag ggatggctaa 8100 

tttctctgtt tgtactaagg atctcatgca cacttgtacc gaaactctac ataaatgtaa 8160 

aataggaatt ttatcctcgc aggagagtct akctccagga agcatcaaat gaaggttatg 8220 

ctcttgagga acaaagcata ttttaattgg ctctttagaa ttagtttaaa aatactctag 8280 

ggaagatacc taataaatat attgtaaatc ttgctacctt gttttcataa gatatggtcc 8340 

attaacatta acaggtaggt cattttctac atttgttcac taaaaataca atttgtgtgt 84 00 

gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt gtgtgtagag ataagtattc catcaatggg 84 60 

gaaaacaaat attttctcca atttcactag acttttctat ccatatgtta gtattgcatt 8520 

gactctctac aatcaattat atgtaagctc ttttgatgta tacatgagaa ctaaataaac. 8580 

aatggcttta aggtggaatt tttggagtag aaatttttac aatcttatgt tactttcaaa 8640 

gagaactgag agggagtggc ttgagttgga agctaagacc ccttttaaaa attatctgtg 8700 

catacctaaa caatattgtc atgagtgggt tagtggcaca aaaatgggaa caaaaaaaat 8760 

ccacactttc cccaaatagt agaattgcat atctgctgtt taaatgctgc attcctagaa 8820 

agatcttctt tgagtggtca tctaaaagca tcctacattc atgttttctt cacagaaaag 8880 

tggtttgcct gaaattgtat gtaacttttt ctgtgtgttt aatctctgtt ccctctatta 8940 

taatgtgtgc tctgctccct ctattataat gtgtgctccc tgagctggga gacttgttta 9000 

ctgtagtaac cgcaggaatt ggaacagaaa gaaatgaagc tatacagtat acatgagtaa 9060 

acaggcagtg acattacaaa gtggaaaaaa acaagtcttc attttgtacc ctcttagcca 9120 

tatatcagat agaaattaat ttctctagtt taatcgttcc tgaataaagg taaggcacac 9180 

aactatgggt cttaattgaa aatgctttgc ttttctttct tcattttgta tctgaaacaa 9240 

tacaatatca gagctggagg tataataaag atcagacttc ctttatttat ccatttgaaa 9300 

gatgcaaata acctagggtt tttgtattta attttcattc ctttggattt tttgtttcct 9360 

cacgaagttt gaataaaatt accaaatgtg gagtacacca agaagacagg tataaatgta 9420 

ggaatgaata aacttatgta tgtatacatg tatggcagag agaaatagag aatatgtatg 9480 

tttgtgtaag ttatgtgggt ttgatgtata gaaagataca gattaaaaca gacatatagg 9540 

gagacaatgt tatgtaaaat ttccgatgtg attattgaaa caagagaagt aattgtcacc 9600 

tagataaata gatgaatgag cgaatgataa atggatgaaa caaatgccaa atctgaatca 9660 

gagagaaatc ctcacattct ttgtcacttt cagtttcaag agataagaag atgttctccc 9720 

caaaccacac catagtgaca gaattcattc tcttgggact gacagacgac ccagtgctag 9780 

agaagatcct gtttggggta ttccttgcga tctacctaat cacactggca ggcaacctgt 9840 

gcatgatcct gctgatcagg accaattccc acctgcaaac acccatgtat ttcttccttg 9900 

gccacctctc ctttgtagac atttgctatt cttccaatgt tactccaaat atgctgcaca 9960 

atttcctctc agaacagaag accatctcct acgctggatg cttcacacag tgtcttctct 10020 

tcatcgccct agtgatcact gagttttact tccttgcttc aatggcattg gatcgctatg 10080 

tagccatttg cagcccttta cattacagtt ccaggatgtc caagaacatt tgcatctctc 10140 

tggtcactgt gccttacatg tatggcttcc ttaatgggct ctctcagaca ctgctgacct 10200 

ttcacttatc cttctgtggc tcccttgaaa tcaatcattt ctactgcgct gatcctcctc 10260 

ttatcatgct ggcctgctct gacacccgtg tcaaaaagat ggcaatgttt gtagttgcag 10320 

gctttactct ctcaagctct ctcttcatca ttcttctgtc ctatcttttc atttttgcag 10380 

cgatcttcag gatccgttct gctgaaggca ggcacaaagc cttttctacg tgtgcttccc 10440 

acctgacaat agtcactttg ttttatggaa ccctcttctg catgtacgta aggcctccat 10500 

cagagaagtc tgtagaggag tccaaaataa ctgcagtctt ttatactttt ttgaccccaa 10560 

tgctgaaccc attgatctat agcctacgga acacagatgt aatccttgcc atgcaacaaa 10620 

tgattagggg aaaatccttt cataaaattg cagtttaggc ttgtgtttat ttgcagtcac 10680 

gaattgcttg tggagtaaca aactggcttt tgaaatggaa aaacctagtg tagtcgtgat 10740 

ttatttaaca tcatggactg tcagtaacca ctttactttc ttatccaaat gaaaaccttg 10800 

aagattgatt tcttagaaat aaaagcctta atgttgagaa atttaaaatg ttttatttgt 10860 

cagaaattct atgaaaataa attttttagt atctaataat tctatatgaa aatactatgt 10920 
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ttacttactt 
ttcatctcct 
aattttacac 
gctcagctac 
tgatacgtta 
ttcttttttg 
acaccttttt 
gcattccagt 
aatagggtct 
aataggggtg 
tttctgtggc 
agaaaatcaa 
accagatgac 
ggcattttaa 
atgtctgtat 
attaagcaga 
taagtctggt 
atttcggtga 
gttttggcaa 
gaaaagctgg 
tcatcggtgg 
attccatttc 
ggattttaga 
tttattttct 
ccttttagtc 
ttctatttgt 
ttctttggag 
gtcacagtgt 
agagaattat 
tgcccaaagt 
tctctttcta 
tggtggaatg 
ttatgtgata 
gttaacataa 
aattttgata 
cagtcgcaaa 
agtaaaattt 
catttcttct 
aatcacttat 
tatatagatt 
atgaccttaa 
atacataaga 
agcattcact 
atgtatctta 
atgttcccag 
aatttttcta 
aatatttgag 
ttatttgctt 
cagaataaaa 
ctcttcatgc 
aatgtgacct 
aagcacttat 
catttttgta 
tcctcccact 
ttataccata 
tatattttaa 
ggaaaaatgt 
agatatccag 
tgcatacttt 
ctgtaaatta 
ctgcttattg 
ggagaaaaca 
catttatttt 



ggggggtggg 
ttgaatatga 
tttttaaaaa 
ctttgtgctt 
aatttgctgt 
ttcatgtgat 
atgagtttcc 
tatgttttcc 
attaactgcc 
tatatatgtt 
ccattatttt 
tctcagagag 
tgcatattga 
tattttgcac 
gatatcaact 
tgacttgcag 
tctagccaaa 
caagaattta 
tgatggtggg 
gaattgagtg 
ggtatttctt 
tggcaagcag 
atgagtgcaa 
agttatgaga 
aaaatgactg 
gatttaattt 
ataggataga 
agtgctgatt 
tatctccatt 
catttggcca 
attcatactg 
ttggagatta 
gttatatgta 
tttataggca 
tacagaagac 
ggcatacata 
atgagggatc 
caataagttt 
catatacaaa 
aaatatttct 
tacaataata 
catagaagca 
tgtgttttat 
gtgtttttat 
aagtggaact 
tgggtaaatt 
aataatcatt 
ctttgatgtc 
attgaagcat 
tattgctgat 
tttgcttatt 
attcattgca 
ctgtgttgta 
atttattaag 
agtgaatcta 
actaaattcg 
tatcttcact 
tatttttaac 
atatatgcaa 
tttgaaaaac 
ttaagtttgt 
cttaaccttg 
tataccatac 



taaattttta 
tggtccacag 
atgaaattat 
attgtaatca 
aagtgttccc 
aatactggtt 
aagtgtatgg 
agtgccgtac 
ctgtgcatac 
tctggaggaa 
taaaataagg 
gctacctaag 
tacatatgta 
tctgatatta 
actatatcta 
tttttgtaaa 
tcctctttca 
acaattacat 
acaacagagg 
acacagagtc 
tattttatga 
aatggtctca 
aaaatactct 
attgctctgg 
acctgtttct 
tttccttctg 
acataaatcg 
ttaggaacac 
tacaacattg 
ctcagtagga 
tagttgttat 
tgaattgatg 
tcctaagaat 
ttataattaa 
atatatgcta 
ttcatattaa 
tgtcctgtga 
atatgcgtac 
tttgaacact 
gatgaacatc 
atacatatca 
acatttactt 
tccatttgat 
gactgatgta 
gctgaattaa 
tttctccaga 
tcactataac 
tcaagggaag 
atattgaaca 
aatttcttgc 
gctgttatgt 
atatttgata 
atattaaaaa 
caatgctttg 
tttctgccta 
tatgattagt 
tttctcttag 
ccataaatta 
atatatattt 
aawgaaaact 
tttccagaaa 
gtatattaat 
cggcgttcat 
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actaacttag 

agccataatg 

ttctacttca 

atgtgcaaaa 

caaacccatc 

atcgtccttg 

atccaattct 

ctctgactta 

ccataaatgt 

atactcagtg 

gtgttctagt 

gagacacatt 

tttagcccta 

actaacacca 

ctataaagtg 

atgggggcat 

gaaacagtat 

tttagcactg 

aaggtcatct 

ctcacttcct 

gaaatttcca 

tatccagatt 

atatttggcc 

cttgcagcaa 

aattgatttt 

catgtctttg 

acatgaataa 

tttattttaa 

gagcctgagg 

caaaagcagg 

taaagaaatt 

agattattgc 

agctaatgag 

aattgtctta 

tttaggctgc 

ttaattacca 

caaaggatac 

atgttactaa 

gattttcact 

atgttgaatg 

attgtagaaa 

ataatcacct 

tatcttgcaa 

tctgtgcatc 

atgttttcaa 

aagcttctag 

tctatcagta 

tcttagaaaa 

aatgaaggct 

atgaataatg 

aagttttcct 

gtgatatcag 

ttatcttttc 

tttttcaact 

tttcctcttt 

gtacacttaa 

gagactgcaa 

tgcatactat 

caatgtggct 

attttcctct 

ggcaggaaaa 

tctatcgatt 

aataaaacac 



cctgagagaa 

aggttcaaga 

atatggtgtc 

aatataaata 

ttgtaaatat 

attctgaaaa 

atcatcaaac 

ggactaattc 

cttgaaacag 

tttattttgc 

agttcttgaa 

aatccatgca 

ataaattgcc 

tacagaatat 

ttcaaatggt 

attcttgagt 

ctgaaaaaga 

tttgtttttg 

gcagagacaa 

ttttgagggc 

tctattaggc 

tgaaacccta 

taaataactg 

ccatttttaa 

taataaaata 

ggtaaaagtc 

aagtaattat 

tcttcataaa 

cacggagaga 

tttcttggtt 

ctcaaaacac 

tatagatgaa 

aatttccctt 

acgtccatgt 

attttattaa 

ggtacagaca 

attttcttta 

aatgcatcaa 

tgatataatg 

tcatataata 

actgactgaa 

acctgaagag 

accatgattc 

tctaattgtt 

tgcttttgaa 

caacttaaac 

gtaggagctc 

taattctgtt 

acacataagg 

tttgtatttc 

catcctaaaa 

gaattatttt 

cctcttattt 

gatttgtgat 

atgtttggcc 

aatatgtttt 

atcatctgat 

tgttcaattt 

atacatgaaa 

ctatgggggc 

cgttggtggc 

cattaacaac 

tttttttcaa 



attctgaaaa 
tagggtggat 
ttgaacaata 
tatgggtcac 
cttgagaagc 
tcaactctca 
tactttaact 
agtaggagtc 
tatacaaata 
ttcatgaagc 
gctgctatag 
ttattagtca 
tcatatgttt 
gctacagaaa 
tttccagaat 
ctcatttgat 
aggaaggcaa 
tatttgcttg 
caatgaagca 
cagcatcatt 
aaagcattgt 
tttgaaaatt 
tacgattcca 
actttgacat 
ttttatttat 
acctcagaaa 
aagagttgca 
tgaaataatg 
ttaaatttct 
tcaaattctc 
tgcaaatgct 
atagtggtta 
gaaaattccg 
gaagggagaa 
tatattttcc 
aatatgaggg 
tattttgtta 
gcaaatgtga 
tatgcacctt 
gttagatcat 
ccatattgac 
agctgaactc 
catcatatca 
ccttttgtgt 
actcctgcta 
ctgtactaga 
ttgtttatgg 
ctaataagca 
ttgataattt 
tatttggaaa 
ttgaaaaaac 
agatataata 
ttaaccaatt 
gtcaccttta 
acaatataca 
aatttctgat 
gattgctttt 
agaaactata 
tacttaaaca 
cataggaaaa 
ctgtgagtaa 
tttatacaaa 
caaatatctc 
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taaatacaga 
gttgacatgc 
ttaattagaa 
ttcctaagaa 
ttaaaaataa 
cacaattact 
ataaataaat 
acgcagaaaa 
aagacagctg 
ctagaaataa 
cagctattcc 
catggtgttc 
tcccacttag 
tgcatggttc 
ctggtaaagc 
catcagcctt 
ttatacagct 
atatattatg 
ataaagagaa 
gctattacat 
caggatatat 
aaaagcaagg 
ctggtttcca 
aaattccata 
tcatattaga 
aacagtgtca 
ccacggcaga 
ctaatcttgt 
aactacctcc 
gacacggttt 
ttgtagtttt 
gtaatactgt 
attgtggagt 
tgtacctccc 
caaacttatc 
ctccggaacg 
tttgctagtg 
ttcctgcatt 
agagcataca 
tgtaaaatga 
tcactgcaac 
tgggattaca 
ggttcaccat 
agcctcccaa 
tatcacctag 
ctgctttagg 
gtcttattct 
atgacagtgg 
tgagagatag 
ttctaatcac 
tttcaaggtc 
aacatttttt 
tatcaataca 
atatccattc 
gctcttttag 
atcttggaag 
aaaataggtt 
catatttaac 
aataattaaa 
atttacctac 
caaaaatgct 
gtgtttctgg 
aattttgtaa 



ttcttttctc 
ccacatatat 
ttaaaataaa 
tgggaatact 
gacccaccct 
gaaatatgtt 
gttgaatact 
tagtggcagc 
aaaaagtcta 
atacaatatt 
aaatttaaga 
attttataat 
caccgtacca 
ttacttagat 
acaggcatct 
gatcatgaag 
attaaagtga 
tgcaaagaca 
acaaaactta 
tgagaaaata 
aattacagga 
agttaaaaac 
gaacagaatg 
acccaagggt 
agtgtaaggg 
cactgaaagg 
catccaactt 
ggggaataat 
ttctacaacc 
tttacctggt 
gcctggattg 
gacgtcctat 
agtagtttga 
ttcttagcct 
ttccagtttg 
aagacctcta 
ggtcagtaga 
cttggatcct 
aagacttctg 
agtcctgctg 
ctctgcctcc 
ggcatgtgcc 
gttggctagg 
agtgctggga 
ttgttgtaac 
ataatgggaa 
ttagctgtaa 
ataaggcatt 
gtaaacctat 
ccaaagcttt 
gctcacagta 
aactccctga 
ttcagcctga 
ctacacctgt 
ggtgtagcac 
gagaaaaggg 
atttaaatat 
cactccatgt 
ggtatttggt 
cagtttacac 
gctacagtta 
gtagttggtt 
cactttttgt 



taactatgct 
tcttctgcta 
tattaaacat 
tcacagtaat 
acagtctggc 
atgttaatat 
atacgtagaa 
taaaaacatg 
gcacgttcat 
attaatattt 
atctactgaa 
caagaaaaag 
tgtaacagaa 
attacaaaat 
accagatggg 
gaaagtgtaa 
ggatgttcct 
tatccaatgc 
ttaatagcta 
agctaacaaa 
tgcttattta 
ttgttgtaga 
accagttagc 
gaggtggggc 
gagcagtttc 
gatttcttgg 
ccctgtgttc 
ggaggaattg 
cattctgtat 

ggagtgacac 

ggttgttgta 
gggatctcct 
tttcatcttg 
gttgacttag 
atggaatcat 
ggcccacaga 
ggtgatggta 
ggctatggaa 
gagaaatttg 
tgtcttccag 
caggttcatg 
accacggctg 
ctggtctcga 
ttacaggcat 
tgttacttca 
acatggtaag 
aatgagtgcc 
ctgggagtcc 
atccagagga 
tggattcctc 
ggccatcctt 
tttgtaacat 
tccaacttta 
tctccagaat 
acctccttgt 
attaaagtct 
ttggctattc 
aatatgtaat 
caccataatt 
aaacaggcaa 
aaaatctgca 
tactgaaata 
aacattttgt 
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gactttttgc 
ggaatcattt 
ttaaataatc 
actgttgctt 
aatgtatttg 
aaatcatttg 
aattgattaa 
aagtacaata 
acattggttg 
taaggagtca 
ggcataatta 
caaacaaatg 
acctttctat 
ctcaaagtga 
gtagaaggtg 
agtacagttt 
agttaagaaa 
tacatatgga 
tgtccaaatt 
aactttgtat 
taatagtaga 
gcgtgatggc 
aacttctcac 
acactcttgg 
actttgactg 
gcctgcgttc 
cagggcactt 
gcagggcttg 
tccctttacc 
aaatcttcat 
gtttctcact 
gtattccaca 
atagtctggg 
aggcaggaag 
ctttctgtct 
acataatgtc 
agtggcacca 
gaaacagtac 
ccccaggcct 
aggcctatag 
cgattctctt 
gctaactttt 
actcctgacc 
gagccaccgt 
aaaggctttt 
acaagtatat 
ttggtcagag 
acgaatggta 
agtgtctaat 
cctctacatt 
tgatccatat 
tgagtgcaga 
tattcctccc 
tctgcttata 
agacttttgg 
gaatgtattt 
taagtccaaa 
tatgtcttga 
tttttctaaa 
agcaaaaaca 
tatactcttt 
ggtgaaagct 
aacatttggt 



caacaactcc 

caaacactga 

agagaaaaat 

taatttgaca 

tcacaaataa 

aagcactaaa 

aatacaaaat 

ctgtttatgt 

attaatggta 

aatgtttcta 

gaaacttgtg 

atgttcgaaa 

atggcaaacg 

ggactgcctg 

tttctaaata 

aatttttttt 

aaggtaaatc 

tatcttgtaa 

ttccatgata 

gcccagttta 

atttaaaaat 

agaagaggag 

agataagaat 

agcacagaaa 

tgttgcttct 

tctagtggag 

cccaaggggc 

accccttatg 

ttcagcaagc 

tcctgatggg 

gaccttaatc 

aatactcttc 

tcagtcaccc 

agctcaaagt 

tctggtggca 

acagaaacag 

cttccacttc 

cacatattgg 

gcattttttt 

tgcagtgtca 

gtctcagcct 

gtatttttag 

tcaagtgatc 

gcccagcccc 

ccacctttct 

cccatgagca 

gcaatggtct 

atcttggcag 

gtgactaatc 

aaaccaggga 

ttcagctaat 

atctctgctt 

accattatcc 

taaactagaa 

cttacaccaa 

cctaaagaag 

gctactcttt 

catacagaac 

aacggagaaa 

aaagcacttg 

actttgtcat 

gtactgtaat 

gaagaaatta 



acccaaaaga 

aaaaacccta 

tactctgtgc 

ataaaactat 

gacaaagtgg 

acatgaccat 

gctaaaatta 

gtctatattt 

taaaattttt 

taattttatt 

tgaagagttt 

tacaagtagc 

tgtcttgttt 

ctttggataa 

actagaatgt 

tcttagaatg 

attttgtgta 

aatatcaggg 

agcaattgtt 

attcagatct 

aaaagtagga 

gtgctcagtt 

gcctaggtaa 

ctgggaaaag 

ttcccgacca 

aaaagagagc 

ctagttctgt 

gtcagttcat 

accttagcag 

tctgggccat 

acaggacatg 

cttaacctcc 

cagccaacac 

ggccaggtgg 

aagttcctcc 

gaaggaaaat 

caccccttga 

atgcctactc 

tttttttttt 

cgatcttggg 

cccgaatagc 

tatggatgag 

cacccgcttc 

tgtaataaag 

atcaatccat 

tgagcccact 

gtggaatacc 

aagcattgca 

cactctagaa 

agate tggca 

aaactattag 

agtgggccca 

cacaccctta 

aactcaagca 

ctagcaccag 

aagaccaacc 

aaaaggaagc 

taggtaatgc 

tgtgcaaatt 

tcaaggtgga 

tcttaaaaat 

gtcattttta 

tgtttcttca 
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cttaaggaga taaattgtgt tcattataga 
gcaaaaattg actgaatatg aattacttta 
atgtattcca tacacatgtt tttggtgttt 
gttggagaag cagaattagt tagatttata 
aacttgtttt accacatctc tgttcagctg 
tacatagaat tttacacaga attgtaaaac 
ttatctgata gcacagcaca tacgttgcaa 
aatgtttatt aaataacagc cataggtgtg 
actgccaatg aaaaacaaag tttatttaac 
ctccttattt ggtacagatg aaaacgctga 
gaacacagca agagctaggt ttcaattaaa 
ctctagaaac accccaagcc acatgggttt 
atgggtaatt atgtttcaga cagttttaca 
agatatatat atgttgtaat gcatatatat 
gaggaacact cagtgcctag gtaaaaagaa 
atgagataag aagtagtcat ggtcatgact 
acagcatttc ttcaaagttg taatttttat 
tttaagataa aagatctccc aatattttca 
tagcaattgt atcaatttct ttattcttat 
gtatyttaaa agtcatagag aaaatatttg 
aaattaaaaa ggactgagat ttctgaaaca 
gctacaaagg aggcagagga ttgaatgact 
tgttcttcct acaccttcca tcatcctatc 
aaaaagagtt attgtgatgg ccttgatgtt 
attgataagg atgctgaagc tgttcatggc 
cccattgctg tactgtatct gagtgacttc 
ctaaaccaca ttctcaaatc caaattcacc 
ctcccatgtg tagactctaa ttgagtcagg 
ttaagttcta acatgatgaa tgtagaagta 
cataatagtt attattactg ctactaactc 
gatgctctgg ccaggatctg agatgtccac 
atttaatgaa aagacagaga ataaaaaggt 
aaaagtcaag gctaactgta gaaaaccatt 
gtggcttctg tcacagagct cagtgagagc 
cttcccaaga cacacatccc aagacattct 
ttrtaatgaa gcaggaaggg aatgagaaag 
tctgacatcc tagagagtct tctcaaaagc 
tgccctgttg accaggtcca taggagccca 
agaagacaat agatgtgaat gggagtaggg 
ttatctataa atccaaaatg ttccacactt 
tgtacaaaac tcaaaattga tatatcacta 
ccaatggtgt tccagactct gaaaccttct 
caggaactta atatttccca aagtctaaca 
aataatgaat cccattccta catcactctt 
agttggcaaa actcatgtgc tccgtgagct 
gttgttcata atttgaatga ctttttgcta 
cacacagtaa ttgcattata aactctacag 
tcatgcttct ttctcaggta ggattagata 
aaatatttta attgatagaa ataaatggag 
tttatattaa agtgttgcta ttcttttcag 
tgaaaagaaa ataggccaaa agaattttta 
tcaggaggca caaatctagg tgctgctgag 
ccctaaagga gctcatgctt tgatgttgag 
ccaaataata ctagtagcct caatttgtta 
aaacttttac tgaattaact ttctcatcaa 
ataagaaaac tgaggcacac agaaattaat 
agaggcactg ggtttttcac ccaggaagtg 
ctgctggctt tcaatagtaa actatgctta 
tctggtatta aatcatcctt ctgattcatt 
taagagctgt atcatgtaat attttatagc 
ttatagccta tattacttaa aaataataaa 
tttctttttc cactccaaat tgaaccatga 
aagacaaaga ccaaccttct agactctttg 



13 

actttgaaag tgaacgacca aaccaaatgt 18540 

cacttatgtg tgaaaataat cttcaacatt 18600 

taaattatct ccgggctttc ctaaattcaa 18660 

gcacactcac tatagaaaac atattttgaa 18720 

gatggggctc ctgtgaaaga acaaagcaat 18780 

ctgcaaattt tttttaacat tttctttacc 18840 

ctcttactct gaaaactact atcacttctg 18900 

cgaaggattt acaaatggat agccacattt 18960 

atgtgcagta tctttgtgat atatatatta 19020 

agctcagaag aatgaaataa tgttttccag 19080 

atattattct cgtagctact tctgggttga 19140 

gttaagatgt ttaactgggt aattattttt 19200 

attttacaga tataaaaata tatatgaaaa 19260 

atattcatat ttcatatgaa gaaggcacaa 19320 

tggagttgga tagtaggtta atagtggttg 19380 

tactttcctg gtgtcatgca atttctattg 19440 

tgagaataaa ttaataaaac tctgtaagat 19500 

aagcatcatt acaaccttct ttaaagtttc 19560 

tttcattgta gaaaaaaatt gatttttgag 19620 

gccatggaaa gggtccaaaa tgggaggagg 19680 

catacattgt tccaagggag taccaagcat 19740 

gaagaaaaga ttttgaagga cactttgagg 19800 

tcaggagggt gaatggattg gagagattat 19860 

agtccaaaac atccttttct agtctatgtg 19920 

tgtggaaagg gaaggggaga ggagagttat 19980 

acctcagagg aaaggaacag agaagcatct 20040 

agcaataatg ctagtaattt tcatatctga 20100 

atccaaaagg agggcaattc aatggctcct 20160 

ttccaaattt ttagctaaaa taatatttat 20220 

actgcaagac ttatggaata actactatat 20280 

aattctactc atatgtttta gctaaagaga 20340 

aaagttaaag gaatcagcca aaggggttaa 204 00 

tccatcttta ggtctgaagg accaaggaca 20460 

tgaagccagg aagaaatccc ttgccaacat 20520 

ccagggaggg atgcagccaa tttagcacac 20580 

aaattgaggt aattccctat ctttcccact 20640 

caaggtcagc tggaaaccag acaccatagg 20700 

actccaggtg aacaatgcaa ggcacagaag 20760 

tggaaaagga gaatcaccag cacacctata 2 0820 

acacattact atccctatta aaaagataaa 20880 

gttcaagttc aacaaaagtt agtttatctt 20940 

cccgatacca tccactatac ttgctgtaag 21000 

tattaaatta tttgggagta aagtaaaata 21060 

gggggatgcg attccctgaa gacagtgtgg 21120 

taggattttc agaagaatcg aaatattgtg 21180 

gaaataattg acctgcaggg atttggtctc 21240 

gttctcaata cccattctgt aatcatgaaa 21300 

tattttttaa aaagtttaaa atccggcagt 21360 

ctgaaacata ttctgatgaa ttagaaaatg 21420 

agtactggtg tttacagagt gtttgcataa 214 80 

tttcatcaat aattattgac tgcctccaat 21540 

tctataacta gatcagataa aaatccttga 21600 

tttcgttcta gactagacct ggatcattca 21660 

agcatgtagt acatgacaga tatattttaa 21720 

gagagaatgg ctattaatca aattttatgg 21780 

aacttgcccc aggccacaca ggtaggaact 21840 

tgactgcaaa ctcccagtaa ctgtcccact 21900 

ctgtttatac ttgatttaga acaatcaaga 21960 

ttgtgttaac aatgccactc tttactatta 22020 

cctttcataa agaatgggct acttttaggt 22080 

aatatgtttt gtgaagctgc cttctctaga 2214 0 

ccactttcca aggagacacc actcttgtta 22200 

taggacagaa tagaagaagc aagcattcac 22260 
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atgggaatgt ctcagtgtag ggtgcatgtc aggagagcac aggctcagat gcctccaaag 22320 

ccatttctgt ccaattgcag aagcttctcc ctgagatttc cttgctcaaa gcagtaggat 223 80 

attgtgttct tgttttttca gaagaaagac tatttccaca aaattctcag ttaaattatt 22440 

tggactattt attgcttgca ttacttcaac ttccaataga aaaaaaaaaa gactcataaa 22500 

aaaacagatg aattgagaag ttattttgaa gggtgattaa tatttcaata aaaaagccca 22560 

ttaaaaatgt aattaggcgg ggcacggtgg ctcccgcctg ttatcccagt actttgggag 22620 

gctgaagcgg gtggattgcc tgaggacagg agttcaagac cagcttggcc aacatagtga 22680 

aaccccgttt ctactaaaaa tacaaaaaaa ttagctgaac gtggtggtgg gcacctgtaa 22740 

tcccagctac tcgggaggtt gaggcagcag tattgcttga acctgggaga cagaggttgc 22800 

agtgagctga gattgcacca ttgcactcca gcctgggcga caagagtgaa actccgccac 22860 

acacacacac acaaatgaaa tccagcacca aaacagtatt tgcctatttg gcaattaaat 22920 

gtaccatcaa acagagggat agagaatgtg gtatattata tatatatcat acattataat 22980 

gtgatataat atataataca ccacagtttc tttatccact tgttgtttga tggatatata 23040 

catatatatt atatgtaata tatatacaaa atgtatatga tggaatacta ctcagccata 23100 

aaaagggatg aagtaatggc attcacagca acttggatgg gattggagac taatattcta 23160 

agtgaagtaa ttcagaaata aaataccaaa tatagtgtgt tctcactcat aagtggaaac 23220 

taagctgtca ggatgcaaag gcataagaat gacacaaagg acttcgggga cttgggggaa 23280 

acggtggagg aggtgaggga taaaaggcta cacattgggt tcattgtata ctgtttggat 23340 

gatgggtgca ccaaaatctc acaaatcacc actaaagaac ttactcatgt aaccaaatac 23400 

caccttttcc cccaaaacct atggaaataa aaaataaaaa taaaaaccaa aagaaattca 23460 

attggaccct gtgagcttta acaaggtaat tatgattcta ttgatttagg tgacttatct 23520 

tctaacttat taccctaggc agaagcccaa tgtccttttg tgtagaatga gataatacgg 23580 

tggacaagtt ttgaacattg aatttacaga gtgctttatt tatgaagtga cctgtttcca 23640 

gtgttggcat ggtagttatt aaaaaaaagt tagtataaca ttgtcaagtc tggatatgtg 23700 

tcatagaaga aacacgtcga ttgttccatc atgatcttca tggtcacttt tatcttgccc 23760 

aaatgttgag agattcaggt caggatattt taagtatggt ctcagtcatg ttgttaatat 23820 

gattatgatt tcactttgat ctgttacatg ttttaacaaa ttcatattga acttgtgttc 23880 

caatgttttt tcttagtttt tcaagtgtta gaatgaggta gagttaaaac aacagcattt 23940 

tagtttgagg ataagtttct tcatcatttt atccttattg cccccctccc caggagccct 24000 

ttgatacact attgttttct ttatatccct atcaacattt aataccatca atacaggtgc 24060 

acttgtgtat ctatatacca tgcaaatgta tatggtttat gtattatgtg catatgtatt 24120 

ataaatacaa atatgtaata tctactttca ttccccaaga accaattttt gcccccttgg 24180 

acaaaaatat acaaaatgaa gaaatgtagc ttggtcttta aagttaaaaa agatttgttg 24240 

ataagagggc cagctgaggt agaataaagt gaagaaaatg ggcacacatt agagggtgag 24300 

ggaagatctg gaaaatgatc tccagttcac aggggcaggc aaagcgattc ttgttctaca 24360 

gggatgtata ccaggcatca gtcccacttt cctaagcacg ttcagttgtg ataaacctgg 24420 

agggtttcaa aggctgatac tttagatccc acattcaaag gtgtgttgtt aaacaaagaa 24480 

ttacagtttc aaagaaaagc aatgtttaca accatgggtt caagaaaagt ctaagtgaac 24540 

acatataaca aagacttgca aaaagataaa agataaggct ctttaactat caaaagactt 24600 

gcagaaaaga accacagaaa accattttaa atattattgc ctttgtatat taaaaaactc 24660 

tatattagtt tagatgttta aagcatcaat cacatgctca ctaggctatt tcttaatgtc 24720 

acatgtattt acattttgag agaagaggaa gaaatagcag atgacaccac tggggtaatg 24780 

cataaatgac aaacctaaat gcattttaat ttccttttat ttagatgtca tttgaagcca 24840 

agcaaacaca atgttaaaga aaaaccatac agccgtgact gagtttgttc tcctgggact 24900 

gacagatcgg gctgagctgc agtcccttct ttttgtggta tttctagtca tctaccttat 24960 

cacagtaatc ggcaatgtga gcatgatctt gttaatcaga agtgactcga cactacacac 25020 

tccaatgtac ttcttcctca gtcacctctc ctttgtagat ctctgttata ccaccaatgt 25080 

tactcctcag atgctggtta actttttatc caagagaaaa accatttcct tcatcggctg 25140 

ctttatccaa tttcactttt tcattgcact ggtgattaca gattattata tgctcacagt 25200 

gatggcttat gaccgctaca tggccatctg caagcccttg ttatatggaa gcaaaatgac 25260 

caggtgtgtc tgcctctgtc tggctgctgc tccctatatt tatggctttg caaatggtct 25320 

aagcacagac caccctgatg cttcgtctgt ccttctgtgg acccaatgac atcaaccact 25380 

tttactgtgc ggacccaccc ctcttagtcc tcgcctgctc agatacttat gtcaaagaga 25440 

ccgccatgtt ggtggtggct ggttccaacc tcatttgctc tctcaccgtc atcctcattt 25500 

cctacacttt catcttcact gccattctgc gtatccacac tgctgagggg aggcgcaagg 25560 

ccttctccac ctgcgggtct catgtgaccg ctgtcactgt cttctatggg acactgttct 25620 

gcatgtacct gaggccccct tctgagacat ctatacaaca ggggaaaatt gtagctgttt 25680 

tttatatctt tgtgagtccg atgttaaacc cattgatcta cagcctgagg aataaagacg 25740 

ttaaaagaag tataaggaaa gttattcaaa agaaactgtt tgctaagtaa ggtagatatt 25800 

ttggtcatag gcgttggaat ctgttcttat tatctgacca attaatgaac atttaaaatt 25860 

aacaaatcaa tctgtcattg agtgtttttt gtcctttgta atttgcatat gggacttaaa 25920 

agtgtatgtc aaattattag ctagagctta cactgccacc tcagtaaatt gaaaatgaaa 25980 

gcatagaaat tcaaatataa gactagaaga cattgttcta gctctgtaaa aagtaatgaa 26040 
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gtacagatga ttccattatt agtactcata 
tactaatata tcatagtact atatcatagt 
ctaatatatc atagtactat atcatagtac 
aatatatcat agtactatat catagtacta 
tatatcatag tactatatca tagtactgta 
catagtacta atatatcata gtactgtatc 
tagtactata tatcatagta ctatatatca 
tagtactata tatcatacta aaatatcata 
gacagctcat tacatttttt taaatgtttt 
ctacctaact tctatgtatt cagaatctca 
agttgtaatc ttctatgctt atcttcgtga 
gagtgactac atttgctaat tacattttat 
agaaacttgc caaccatagt agatgaggtt 
aatgaagatc ctttaggtgt actttggatt 
tcatagatcc aatgaaaccc tatttcataa 
agaaagagtt tttaattgtt tgtatgttgt 
tcaattcttc actcttttat taccattgcc 
atgttttatc cacgaatgca tgtggatatt 
gtgtgtattt tgattacaat ataacatttt 
tggtgagaat caaggtcgta tgcctttttt 
ctctgttgct caggctggag tacagtggcg 
cgggttcacg ccattctcct gcctcagcct 
accacgcccg gctaattttt tgtattttta 
gatggtcttg atctcctgac ctcgtgatcc 
tacaggcatg agccaccgcg cccagctgct 
gttttcttat ttatttcaga tatttctcaa 
ggtgtagcaa ttgtatatat atacacatga 
tccaaagtta atattttatc attttaaatt 
tgtctttttt tacttttagg aagttaagtt 
atatattatt aaattatttt catggtttcg 
aatatacatc taaaatagaa aataatataa 
ttaaatcttc acagaatgcc ccatttgttt 
gatacatctc tcctgactga ttctgtctct 
aaatttaatg tttcttatcc ttctgtagag 
atagaaatat gtagtactga tttgctacat 
tgtatctttc tgcaatttat ttatcttgca 
tcatgcagca gcaattttta gtggtatagt 
gtgtatttat tctacgattt tttttttttt 
gctgggg£gc aatggcacga tctcggctca 
ttctcctgtc tagcctccta gtagctggga 
ttttgtattt ttaatagaga cgggatttca 
gaccttgtga tcctcccgcc tcagcctccc 
gagttacatt ttccagtttc ttgataatat 
attatgaatg tgtaccacgt aaggagttcc 
aagatataag tatctttaaa tttactatat 
tatcaacagt ttgtgagtgt tacctttaca 
tcagacttta aatgttttct aatctgatgg 
ggttgaattg ttattacaag tgaaactaag 
atttctcttt cacttatatg tccatatact 
tttatttcat acatttataa tagttcctta 
tgttatatgc catttaacct taaaaaccta 
catgtaatcc cagtactttg ggaggctgag 
aggccaaact acacaagata gagactctgt 
gttgtggtgg gatgcccttg tggtcccagc 
tgagcctggg aggtagaggc ttcagtgaac 
tgacagagga tggatgacag agcctggatg 
aacaaaacaa accccaaaaa acctagggtg 
gcctcagcaa atagccatag gtatctattt 
tttggaattt cagctcattt gacaagtatt 
gatttttttt ttccacatag tgaatgctta 
gggattggat aaattatatg aagtgtgatc 
gttattaccc aaatagaaaa ttaacattta 
aggcctgtaa tcccagcctt ttgggaggcc 
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gtactaatat atcatagtac tatatcatag 26100 

actaatatat catagtacta tatcatagta 26160 

taatatatca tagtactata tcatagtact 26220 

atatatcata gtactatatc atagtactaa 26280 

tcatagtact aatatatcat agtactgtat 26340 

atagtactaa tatatcatag tactgtatca 26400 

tagtactata tatcatagta ctatatatca 26460 

aaagacagaa ttatcaattg ctgagaaaat 26520 

tgtttgtccc tttgaatgaa tggaggatat 26580 

catgaggcag cacatttagc acacttgttt 26640 

gtctgtttga aaagttggat aaatggcaat 26700 

aatacttaat aagcctccaa gctattttcc 26760 

tctgtttttc ttttatacat attatataga 26820 

attactttaa gttatatcct tagaatgata 26880 

ggcttttcat ggaaatgcca aatttccttt 26940 

accagtaagg attcccatca acatgtgaac 27000 

tgtttccttt gaataaataa attgaattga 27060 

tgtgtgtttt atgtgtgtat gtaggtgtgt 27120 

ctttttttta attgcttatt caaatttttt 27180 

tttttttttt ttttttttga cagagtattg 27240 

cgatctcggc tcactgaaag ctccgcctcc 27300 

cccgagtagc tgggactaca ggcgcccgcc 27360 

gtagagacgg ggtttcaccg tgttagccag 27420 

gcccgcctcg gcctcccaaa gtactgggat- 27480 

tcacctgttt ttaaattgtg gtgtttatat 27540 

tgatataaac atgtttttgt tatatataca 27600 

attccatttt atctcttcta ttgtgtcttt 27660 

ttctcaatat ttttttattt cactttattg 27720 

tatgccacct atgatcatgt ggatttttac 27780 

cttgttaatt ttttcttatg aactattttt 27840 

agactccttt gtaagcacaa ctaatcacta 27900 

cagttatttc agaaatttaa aacttacaaa 27960 

tgtcccattt cttgctgtaa actctattac 28020 

ttacatcatt ggataggtgt ttatatattc 28080 

ttaaaacttg aaataaatgc atccactcta 28140 

ctaaatattg ttggatgtat ctgtggtgaa 28200 

ctatgccaat atataaatat gcctcaatat 28260 

ttgagacaga gtctcactct tgttgcccag 28320 

ccgcaaactc cgccccccag ggtcaagcga 28380 

ttacaggtgc ccaccaccat gccaggctaa 28440 

ccatgttagc caggatggtc ttgatcacct 28500 

aaattgtatt ttatttctac tgttggtgaa 2 8560 

gaaaatccat taataaatgt tttgtgtgta 28620 

tctcaggtag gttggtaatt tcttggtcat 28680 

attgccaatt gttttactga tttatatttc 28740 

tcactttatt cttgccaaaa ctaggtgttg 28800 

gtaaacaatg aattctcatt tggtgtttaa 28860 

tatttttaag tatattgatg ctaatctcat 28920 

ttgttaactt ttcaggtagg gtgatttgtc 28980 

tgtaccaaat gctacataca aatcttttgg 29040 

gggtaataat gccaggcagt gtggctcaca 29100 

gcaggaggat tgcttgagcc caggagtttt 29160 

ctttacaaaa aaataaaaaa aaattagcca 29220 

tacatgggaa gctgaggcag gaggatcact 29280 

catgttgtca ccacgacact ccagtctgga 29340 

actccagcct ggatgaccct gtctcaataa 29400 

agatgaacag atgaacgtta tttcttgaat 29460 

ttctggtagc tgcaatttga ttgggacaag 29520 

ttcataatag atattttccc agtaggattg 29580 

tatgtgagca taaaatgttt cccacagatt 29640 

ctagtatatt gctttgccat tgaaatactg 29700 

aaagagccaa caggctgggc aaagtggctt 29760 

aaggcccaca gatcacttga gttcaggagt 29820 
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tcgagactaa 
taggtctggt 
cttcagcctg 
ggtaacagag 
tgcctgtaat 
agaccagcct 
gtgcagtggt 
gaacccgggg 
gacagagcca 
tataataaga 
gaaattcaac 
attaaaaagg 
gagagaactt 
aaacagttgc 
agtcacacag 
tgtgagaaat 
ccagtctaca 
gaggtaccgg 
cgcaccctgc 
cagggagttc 
cactcccacc 
atatcccgca 
gcagtctgag 
ccaggcatgc 
gctcaaggag 
aaaagacagc 
gttctcccag 
ctgacccctg 
acctcacacg 
aaactaacaa 
agaccaaaag 
tctaaaaagc 
caaagctgga 
tactccaagc 
aatttagaag 
gagctgaaaa 
gatcaactgg 
gggaagttta 
tatgtgaaaa 
accaagttgg 
caggccaaca 
gcaactccaa 
agggcagcca 
gatctctcag 
aaagaaaaga 
ggagaaataa 
accctaaaag 
ctgcaaaatc 
cgagcaaaat 
actttaaaca 
aagagtcaag 
cataggctca 
gaaggggttg 
gacaaagaag 
ctaaatacat 
ctacaaagag 
acattagaca 
ctgcaccaag 
acattttttt 
gctcttctca 
gcaatcaaac 
ctgaacaacc 
atgttctttg 



cctgggcaac 
ggcaagtgtc 
ggaggcaaag 
taagatcctg 
cccagcactt 
ggccaagatg 
gggtgcctgt 
ggcagaggtt 
gactccaact 
atgccaacag 
tccaatcttt 
catttttttt 
catatttcaa 
acatacccat 
aaaattcata 
gaaacctatt 
gctcccagcg 
gttcatctca 
gcgagccgaa 
cctttcctag 
ccaatactgt 
.cctggctcag 
atcaaactgc 
ttaggtaaac 
gcctgcctgc 
agtaacctct 
cacgcagctg 
acccccgagc 
gccgggtact 
acagaaagga 
cagataaaac 
agagcacctc 
tggagaacga 
tacgggaggt 
aatgtataac 
ccaaggctcg 
aagaaagggt 
gagaaaaaag 
gaccaaatct 
aaaacactct 
tccagattca 
gacacataat 
gagagaaagg 
cagaaactct 
attttcaacc 
aatactttac 
agctcctgaa 
atgccaaaat 
aaccagttaa 
caaatggact 
acccatcagt 
aaataaaagg 
caatcctagt 
gccattacat 
atgcacccaa 
acttagactc 
gatcaacgag 
catacctaat 
cagcaccaca 
gcaaatgtaa 
tagaactcag 
tactcctgaa 
aaaccaacga 



atgaggaaac 
tgtagtccaa 
gttgcagtga 
tcttaaaaaa 
tgggaggccg 
gcgaaacccc 
aattccagct 
gccatgagcc 
caaaaataaa 
aataattctg 
taattttatt 
cttaaccaat 
ggcacagccc 
tcctgtgatt 
ataatacaat 
gtatcggtta 
tgagcgacgc 
ctagggagtg 
gcagggtgag 
tcaaagaaag 
gcttttccaa 
agagtcctat 
aaggcggcag 
aaagcagcca 
ctctgtaggc 
gcggacttaa 
gagatctgag 
agcataactg 
ccaacagacc 
catccacacc 
cacaaagata 
tcctcctcca 
ctttgacgag 
cattcaaacc 
tagaataaca 
agaactacgt 
atcagcaatg 
aataaaaaga 
atgtctgatt 
gcaggatatt 
ggaaatacag 
tgtcagatac 
tcgggttacc 
acaagccaga 
cagaatttca 
agacaaccaa 
ggaagcacta 
gtaaagacca 
catcataatt 
aaatgctcca 
gtactgtatt 
atggaggaag 
ctctgataaa 
aatggtaaag 
tacaggagca 
ccacacatta 
acagaaagtc 
agacatctac 
ccacacctgc 
aagaacagaa 
gattaagaat 
tgactactgg 
gaacaaagac 
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cctgtctcta 

gctactcagg 

gctgagatca 

ataataaaat 

aggcaggtga 

gtctctacta 

acttgggagg 

aagatcgtgc 

taaataaata 

atttcagcta 

ctcctttatt 

agttttatcc 

aaatatcaac 

gtgtagatat 

gccatgtttc 

agtactgaag 

agaagatggg 

gcagacagtg 

gcattgcctc 

gggtgacaga 

cgggcttaaa 

gcccacggag 

cgaggctggg 

ggaagctcga 

tccacctctg 

atgtccctga 

aaagggcaga 

ggaggcaccc 

tgcagctaag 

gaaaacccat 

gggaaaaaac 

aaggaaagca 

atgagagaag 

aaaggcaaag 

aatagagaga 

gaagaatgaa 

gaagatgaat 

aatgagcaaa 

ggtgtacctg 

atccaagaga 

agcacaccac 

accaaagatg 

ttcaaaggga 

agagagtggg 

tatccagcca 

atgctgagag 

aacatggaaa 

tcaagactag 

acaggatgaa 

attaaaagac 

caggaaaccc 

atctaccaag 

acagacttta 

ggatcaattc 

cccagattca 

ataatgggag 

aacaaggata 

agaactctcc 

tccaaaattg 

attataacaa 

ctcactcaaa 

gtacataacg 

acaacatacc 



aaaaaaccac 

aggctgatgt 

tgccagtgca 

aggccaggtg 

atcacaaggt 

aaaatacaaa 

ctgaggcagg 

cactgtactc 

aataaaataa 

cagtttatgc 

ctgttgttca 

ttattgctgt 

tcccttgtcc 

tgcaactcta 

ctaatatatt 

atggccgaat 

tgatttctgc 

ggccgaggtc 

actctggaag 

tggcacctgg 

aaatggcaca 

tctcactgat 

ggaggggtgc 

actgggtgga 

ggggcaggac 

cagctttgaa 

cttcctcctc 

cccagcatgg 

ggtcctgtct 

ctgtacatca 

agagcagaaa 

gttcctcacc 

aaggcttcag 

aagttgaaaa 

agtgcttaaa 

gaagcctcag 

tgaatgaaat 

gcctccaaga 

aaagtgacag 

acttccccaa 

aaagatactc 

aaatgaagga 

agcccatcag 

ggccaatatt 

aactaagctt 

attttgtaac 

ggaacaaccg 

gaagaaactg 

attcacacat 

atagactggc 

atctcacatg 

caaatggaaa 

aaacaacaaa 

aacaagaaga 

taaagcaagt 

actttaacac 

cccaggaatt 

accccaaatc 

accacatact 

actatctctc 

accactcaac 

aaatgaaggc 

agaatctctg 



aaaaattagt 

gggaggatgg 

ctccagcctg 

cagtagttca 

caggagttcg 

aattagccgg 

agaatcgctt 

tagccttggt 

taaaatataa 

ttttttatga 

gacattgcat 

ccttattctg 

tttcatgagc 

tttgtctgtt 

ctgagagttc 

aggaacagct 

atttccatct 

agtgggtgcg 

cacaaggggt 

aaaatcgggt 

ccaggagatt 

tgctagcaca 

ccaccattgc 

gcccaccaca 

acagacaaac 

gagagcagtg 

aagtgggtcc 

gcagactgac 

gttagaagga 

ccatcatcaa 

aactggaaac 

agcaacggaa 

atgatcaaat 

ctttgaaaaa 

ggagctgatg 

gagccgatgc 

gaagcaagaa 

aatatgggac 

ggagaatgga 

tctagcaagg 

ctcgagaaga 

aaaaatgtta 

actaacagcg 

caacattatt 

cataagtgaa 

caccaggcct 

gtaccagccg 

catcaactaa 

aacaatatta 

aaattggata 

cagagacaca 

acaaaaaaag 

gatcaaaaga 

gctaactatc 

cctgagtgac 

cccactgtca 

gaactcagct 

aacagaatat 

tggaagtaaa 

agaccacagt 

tacatggaaa 

agaaataaag 

ggacacattc 



29880 

29940 

30000 

30060 

30120 

30180 

30240 

30300 

30360 

30420 

30480 

30540 

30600 

30660 

30720 

30780 

30840 

30900 

30960 

31020 

31080 

31140 

31200 

31260 

31320 

31380 

31440 

31500 

31560 

31620 

31680 

31740 

31800 

31860 

31920 

31980 

32040 

32100 

32160 

32220 

32280 

32340 

32400 

32460 

32520 

32580 

32640 

32700 

32760 

32820 

32880 

32940 

33000 

33060 

33120 

33180 

33240 

33300 

33360 

33420 

33480 

33540 

33600 
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aaagcagtgt gtagagggaa atttatagca ctgaatgccc acaagagaaa gcaggaaaga 33660 

tccaaaattg acaccctaac atcacaatta aaagaactag aaaagcaaga gcaaacacat 33720 

tcaaaagcta gcagaaggca agaaataact gaaatcagag cagaactgaa ggaaatggag 33780 

acacaaaaaa cccttcaaaa aattaatgaa tccaggagct ggttttttga aaggatcaac 33840 

aaaattgata aaccgctagc aagactaata aagaaaaaaa gagagaagaa tcaaatagag 33900 

gcaataaaaa atgataaagg ggatatcacc accgatccca cagaaataca aactaccatc 33960 

agaggatact acaaacacct ctatgcaaat agactagaaa atctagaaga aatggataaa 34020 

ttcctcaaca catacactct cccaaactaa accaggaaga agttgaatct ctgaataggc 34080 

caataacagg atctgaaatt gtggcaataa tcaatagctt accaacgaaa aagagttcag 34140 

gaccagatgg attcacagcc gaattccacc agaggtacaa ggaggaactg gtaccattcc 34200 

ttctgaaact attccaatca atagaaaaag agggaatcct ccctatctca ttttatgagg 34260 

ccagcatcat cccgatacca aagcctggca gagacacaac caaaaaagag aattttagac 34320 

caatatcctt gatgaacatt gatgcaaaaa tcctcagtaa aatactggca aaccgaatcc 34380 

agcagcacat caaaaagctt atccaccatg atcaagtgga cttcatccct gggatgcaag 34440 

gctggttcaa tatatgcaaa tcaataaatg taatccagca tataaacaga accaaagaca 34500 

aaaaccacat gattatctca atagatgcag aaaaggcctt tgacaaaatt caacaacact 34560 

tcatgctaaa aactctcaat aaattaggta ttgatgggac gtatttcaaa ataataagag 34620 

ctatctatga caaacccaca gccaatatca tactgaatgg gcaaaaactg gaagcattcc 34680 

ctttgaaaac tggcacaagg cagggatgcc ctctctcacc actcctattc aatatagtgt 34740 

tggaagttct ggccagggca attaggcagg agaaggaaat aaagggtatt caattaggaa 34800 

aagaggaagt caaattgtcc ctgtttgcag atgacatgat tgtatatcta gaaaacccca 34860 

tcatctcagc cccaaatctc cttaagctga taagcaactt cagcaaagtc tcagaataca 34920 

aaatcaatgt gcaaaaatca caagcattct tatacaccaa caacagacaa acagagagcc 34980 

aaatcatgag tgaaatccca ttcacagttg cttcaaagag aataaaatac ctaggaatcc 35040 

aacttacaag ggatgtgaag gacctcttca aggagaacta caaaccactg ctcaaggaaa 35100 

taaaagagga tacaaacaaa tggaagaaca ttccatgctc atgggtagga agaattaata 35160 

tcttgaaaat gtccatactg cccaaggtaa tttacagatt caatgccatc cccatcaagc 35220 

taccaagggc tttcttcaca gaattggaaa aaactacttt aaagttcata tggaaccgaa 35280 

aaagagcccg catcgccaag tcaatcctaa gccaaaagaa caaagctgga ggcatcacac 35340 

tacctgactt caaactatac tacaaggcta cagtaaccaa aacagcatgg tactggtagc 354 00 

aaaacagaga tatagatgaa tggaacagaa cagagccctt agaaataacg ccgcatatct 354 60 

acaactatct gatctttgac aaacctgagg aaaacaagca atggggaaag gattccctat 35520 

ttaataaatg gtgcagggaa aactggctag ccatatgtag aaagctgaaa ctggatccct 35580 

tccttacacc ttatacaaaa atcaattcaa gatggattaa agacttaaat gttagaccta 35640 

aaaccataaa aaccctagaa gaaaacctag gcattaccat tcaggacata ggcatgggca 35700 

aggactttat gtctaaaaca ccaaaagcaa tggcaacaaa agccaaaatt tacaaatggg 35760 

atctaattaa actaaagagc ttctgcacag caaaagaaac taccatcaga gtgaacaggc 35820 

aacctacaaa atgggagaaa attttcacaa cctactcatc tgacaaaggg ctaatatcca 35880 

gaatctacaa tgaactcaaa caaatttaca agaaaaacaa acaaccctgt caaaaagtgg 35940 

gtgaaggaca tgaacagaca cttctcaaaa gaagacattt atgcagccaa aagacacatg 36000 

aaaaaatgct catcatcact ggccatcaga gaaatgcaaa tcaaaagcac aatgagatac 36060 

catctcacac cagctagaat ggcaatcatt aaaaagtcag gaaactacag gtgctggaga 36120 

ggatgtggag aaataggaac acttttacac tgttggtggg actgtaaact agttcaacca 36180 

ttgtggaagt cagtgtggcg attcctcagg gatctagaac tagaaaaacc atttgaccca 36240 

gccatcccat tactgggtat atacccaaag gactataaat catgctgcta taaagacaca 363 00 

tgcacacgta tgtttattgc ggcattattc acaatagcaa agacttggaa ccaacggaaa 36360 

tgtccaacaa tgatagactg gattaagaaa atgtggcaca tatacaccat ggaatactat 36420 

gcagccataa aaaatgatga gttcatgtcc tttgtaggga tatggatgaa attggaaatc 36480 

atcattctca gtaaactatc gcaagaacaa aaaaccaaac actgcatatc ctcactcata 36540 

gatgggaatt gaacaatgag aacacatgga cacaggaagg ggaacatcac actctgggga 36600 

cggttgtggg gtggggggag gggggaggga ttgcattggg agatatacct aatgctagat 36660 

gacgagttag tgggtgcagg gcaccagcaa ggcacatgta tacatatgta actaacctgc 36720 

acattgtaca catgtaccct aaaacttaaa gtataataat agtaataata ataaattaaa 36780 

aaacagaaac attatatcta tctccttgtt ttcatagcaa gtcatcaata catgtttcta 36840 

aattgagaaa gaataattac aattgaaggc acatagtgat gaaagaaaaa ttgttgattc 36900 

tgacatttgg agctgaaaat atttaatagc tacaatcttt aaaagtcagt atttgaaatg 36960 

aaaagttctt ttatatatgg ttgggtgatc tgttggataa tgttttattc ttcttaaagc 37020 

aaggtgtcct tgttctttgg acatattttc agtgttgtga gttgtctttc cttgtctctg 37080 

tctctttctc tctgttgccc tctctgtatc actgcctttc tctcaattcc tatttcataa 37140 

tttggtttag cttatattgt gtcttacaga ctcttcacat gttggcctca gccttacata 37200 

ctttcaggac aatctcagta cagacaaggc tgctgcttgt ttttggaaat gactcctctt 37260 

gcctgaaatg gctcaagtgc acagcaggtg tatgtgtgtt tgttggtgtg tatgtattgt 37320 

gtatgttcat gtatggtgct ggtggaggtg caatcataca acgagaataa tttttgctct 37380 
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acaatagcta ctaccattta ttttactttt tgcagtaaaa tttttttctt acccactgct 37440 

ctgttcatta acaaacttct agttaggtaa gaaggtgtta ttaactaaga gtttgttaat 37500 

gatagtttat cctgttcaga tgtgagaaag aatgttaaaa ctagtgtcta ctctattagg 37560 

agatacatta gatgatctca catgcacaat tcctatgtcc attttgcaaa catatttact 37620 

gcactgccta ttataatgca tagaattctg ttggtggaaa aatacaagtg tgtccagaca 37680 

ttttcgcact ggtttgctat gctaatattg ttatatgttg aaagtgcatg atttgagatt 37740 

gactgatgca ttgtcatttt ccccataact tgccatatta gatcactaca ttaatttatt 37800 

ttgcttgctg actgtctatt tgttctaaca ttatttttta aaacaatcag aaacttctta 37860 

gtagtatatg atactcgcac ataattgttt ggactaaatt agcagtttga ccatccttat 37920 

gaggatcaga acaaaatcag atcataagac atgacagaac acatatagct ttgagactaa 37980 

aaccttaaaa atgtgactgt tttccttctc ttttttgagg cttcttagag ttgaaagctt 38040 

catgtgtttt ctcttttctg tttctgttat actgcctgac agaaacatca ctttacttac 38100 

catgcttatt cagtgtacct gcttttcaag gaaaatgtaa . tttgtgtgtt ggaaaggaat 38160 

ttgaaaagtc actggtcaca tcagcccaca tggatactct ggattcttat ctccatagag 38220 

aaggccaata accatttgcc ttgaccacac ccaattccta cttaaaactc tttatctaac 38280 

atctaagttt ctcttcttct tctctccttc ttcttatttt ttttttttct tttaaccagg 38340 

cttccatgat actggggaca aaaaaagaga agataagaga tgaactagtg gctttctgga 38400 

ctctcaactt gaattatttg ttttaaaaaa gtatagggga aaatgggttt gaggaacacc 38460 

ttagctttcg atgcttaagc ttttgacttg ttgtaatcct gggtatggaa agaactctga 38520 

taactttgtg gctctgtttt ctatttggtt ttgtttttaa tgattctctg gatgttaaac 38580 

atctagcaag taatttgatt tgcttaattc tgaggaaaga aaaaatttcc tttgagttaa 38640 

atttatgggg taaattcact tacattaaat gtaggaactt taagtgtgca gttcagtaaa 3 8700 

gtttgacaaa tgtgtacagc catgttatca caattccaat tttcaaggct actgtgcagc 38760 

tgggaagtgt gagaataggg caagtaaaga tgccacaaag ttcactgtcc ttaccaagat 38820 

tcaggaattt tttcttgaat aaatactcac ggaattgttg caagcttttg attactttct 38880 

agtatccaga aacagttgac ttttgacttt tttttagtgt tctcatggtt ttcatggagg 38940 

gacagttttg ggaaggtcat tactatgtca atcagaaagt tgattcttag atagatagat 39000 

agatagatag atgatagatt catcaagatt gtatgaacag tttgttcatt tttattgctg 39060 

agagtagtaa cccattatag ggatatacta taattggttc atctatgcat ctgctgatac 3 9120 

atgtttgaga ctctgtatgg atatgtcttc ttttgccttg gataaatact gaagagcaaa 39180 

atttctgtgt catagtttta tgatgaacta taaatttata taaaatttta taaagggcca 39240 

ccagtttttc acagcccatt gaattatttt cattattata tcttgttatg tgttataaga 39300 

gtttcagttg cttcacatta tcattgttaa tctttgtatt gtcaatattt tttattcctc 39360 

ccattgaagt gagtgtgaag tgttacaaag ataaacaaag ccagctacca gttaaagtag 39420 

taaggacaga ttttattcag taatatacta ttgcaatgga caggtaggtt cactgtaaac 39480 

tgaactctaa ttttgttttc acagagataa ctgtgtattc taaaggtaga atgaggaaat 39540 

aggaagaggc atgagcaggg ctcaagagag tcagagaagt aaaaaaatta caaaaagtgt 39600 

gaaaggagtg ttgatccatg tgaaacccac ccagttttgc caactggctc ctaccctccc 39660 

ttgggagcta gtagacaaga gctctttctc cacatattgg ctgaaacaga cagtatcttt 39720 

tgttggcagc cttgagtttt ctcaggcagg ttcttccagg agtactacag tcatcccagg 39780 

gatatggcct tgagctgtta gccactgtgt ttgtgttttg ttcaagtctt tttaggccaa 39840 

ggttgaggcc tagtcaagaa agtgctcaga ggattctttc tagagtttgg tccaggagga 39900 

aatctttgtt atgcggtgtt tcataatggg cataatttac attttcttga caaatgcaga 39960 

tgttgaattt cttttcatgt gatcgtttgc tgttcatgta tcttctttta tgaagtgtct 40020 

gctcacaact tagactttct tttttttttt aattgaactg tttatatttg gaccagtgaa 40080 

ttgtgggaga tatttatatg ttcttaatac caatcatttg tcagaattac ttactatgaa 40140 

tagtcttcct agtctatggt ttgccttttc attatcttag cattattttg aagaatgcgt 40200 

gttttaaaat ttgatgacac acacacactt tatatatata taattatata tatatatttt 40260 

attcacaatt agtgctttta ttttcttcct aagaaatctt tgctttcctc aagatcataa 40320 

agctattctt ccattttttt ttttttcctg gaggacttgt agatttagca tttacattta 40380 

gaatttttat ttccaccaat ttagttttga gtattggcat gtggtgtgtt ataggaagtt 40440 

aaatatttta atgaacatat ccatattttc cagtagcagt ctccattaaa atacctttgc 40500 

acctttgttc aaaaatccat tggctacata tacgagtgga tctatttgtg atctctttat 40560 

tctgtacact taacctcttc atcatcctta gatccacaca ctgctttgaa tgctatactt 40620 

cttagtaact cttaggtgga tgatattaac ttaagggaca gtttttcaca tactttattt 40680 

ttttcaagaa agccttcaga aacccctgtt caacttgtgt ttgaagacat taaatgttta 40740 

gttaacctga ttttacttca attacaatga tcaattgatt gatgtctgct agaagtgcac 40800 

ctaattaatt agttatgtaa atactacaat ggtcagataa ttggacacat ggtactgacc 40860 

atggacattt aagtagtgca aattatgcaa caatatttgt tcactttcca aaggggtttt 40920 

tgtgttttta tgacagcatt tattttagaa tcagatcatg aagttttgca aacatcatcc 40980 

tgctgaaatt tgatagaagt ggctttaaaa ttattgatag atttaggaag aattaacaca 41040 

ttgactatat tgaatattct aatgcacaag tgagacgtat gtctttatta tttaggtctt 41100 

ctttaatttc gctctgcaga cgtttgtgct tttcagtgtg tatcttttgc acacttttta 41160 
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tttttgtttc tgagtagctt atgattttat 
gtattgcttt tatgtaactg tatgagtgtt 
ttttttagac agagtcccgc tctgtcaccc 
ctatgcaatg tctgcctccc aggttcaacg 
ggggctacag gcacgttgcc atcacaccgg 
ctccctacct caaataatct gcccacctca 
agccatcgca cctggcctgt atgagtggtt 
atcatcttcc aaaatataca cttaatttcc 
taaagcatca tgtcatctgt gaataacaca 
tatttagtag gtaggacttt cagtaccata 
tagccaattg agtaatcaca aacacacaag 
aactgtctga acatggtgtc atttgtatag 
atacatcaat tggtccttcc agcagaagca 
tcagtaatat agatagataa tataaaactt 
gcatgtgtaa aactcatctt tgagttagta 
accaaatcta taagtaaatt aatgtcacct 
ccatattaaa atttttgaga ctttgctctt 
ttgtctatcc attcagttat tattgagcca 
taactcttat gttttattga aatgttattg 
tgaatataca tatacataca cacatactat 
cattctgttt aacaaagagc caaattgtca 
tgagagacac cagtattata gtcctatatg 
taccatttta tttttattta tttatttatt 
tattatactt taagttttag ggtacatgtg 
catgtgccat gttggtgtgc tgcactcatt 
taatgctatc cctcccccct ccccccactc 
ttcctatgtc catgtgttct cattgttcaa 
ttggttttct gtccttgcga tagtttgctg 
catccatgtc cctacaaagg acaagaactc 
ggtgtatatg tgccacattt tcttaatcca 
tcaagtcttt gctattgtga atagtgcccc 
gcagcatgat ttataatcct ttgggtatat 
aataccattt tacttttaaa ttagtgcaag 
taaggcttta aactcaattt gaggaaaaaa 
gggatgtaac aaatctacat tttgattttg 
tcaaagaata accaccattt gtgatctaag 
agagcccatt ttgattcatt tgtgctaacc 
ctaatgaaca catgtctagc ttcccttatc 
aatattcttc cttctatttt cagccctatt 
tttataaacg gtcctggtct cagacctgtc 
tagggcagac tagttccagc ataaatccag 
gaccaatact tctgcttcct tctactgatt 
tacaaatcct tcctttcctc aagtctttca 
gtaaaggacc tcaccccagt gtctatagac 
tggttttatt atattaaaat tacaaaactt 
ttgaaagtga tggctagttt ctctgtttgt 
actgtacata aatgttgtat agggatttca 
atcaaataaa tgttaagccc ttgaggaaca 
ctttaaaaat actctaggga atatagctaa 
aactatggtg agttaacatt aacaggtagg 
aactttttct gtgtgtggag attagtatcc 
atttcactgg acttttctat tcatatgtta 
gcaagctcct ctgatgtata tgtgagagct 
ttgagcaata atttttacag tcatagctta 
gagttggcag caaagactgc ttttaacaat 
agtagtttag tggcatataa atggagacaa 
taattgttta tttactgttc aaatgttgca 
ctaaaagaat ctcacattca ttgttcttca 
tgtatctttc tctgtgggtt taaccattat 
ctggaaaatt tgttcactcc aataacccca 
tatgcagtca gtgtacatga ataaacaggc 
ttggttttgt atattcttag ccatatatca 
tgacaatcat aggtttctgt ttgtttgttt 
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gtgatatttt taagttttat attctactgt 41220 

tttgggtttt tttattttgt tttgttttgt 41280 

agattggagt gaagtggtat aatcttggct 41340 

gattctcctg cctctgcctc ccaagtagct 41400 

tttcactatg ttggctaggc tagtctcaaa 41460 

ggctcccaaa gtgctgggat tacagttgtg 41520 

tttgtgtatt gaatttgtaa ttgtaatcat 41580 

catttgtata ttttaaaata acttttataa 41640 

gatttgtttc ttttcaattt gtatactttt 41700 

ttgaaccgac tacaactttt aaataaaaaa 41760 

taaaatctta agggccaata ggatgtatcc 41820 

ctaaatccat taggtgttcc actagcagac 41880 

aagtgacatc gactaaataa actaatgtgt 41940 

tctgaaggct ttctatgaaa cataaaataa 42000 

taatatacat agctaaatct gtattcccta 42060 

tctaatgtgt ggacatctga tcatgtgatc 42120 

cctcatcaaa aataaccaat ccattacttt 42180 

gaatcatttg tagtcagaag agtcattttg 42240 

catattaaca tatgtaagaa aacaaataaa 42300 

agggcatatt gccttcaaat tgaaagctgc 42360 

attagtttta tgcattaagg gatgaggaaa 42420 

acccttcatt ctatagaaat atttaagaaa 42480 

tatttatttt ttattttatt ttattattat 42540 

cacaatgtgc aggtttgtta catatgtata 42600 

aactcgtcat ttagcattag gtatatctcc 42660 

cacaacagtc cctggagcgt gatgttcccc 42720 

ttccaaccta tgagtgagaa catgtggtgt 42780 

agaatttgct gaaaattgag tttccagttt 42840 

ttcatttttt atggctgcat agtattccat 42900 

gtctatcgtt gttggacatt taggttggtt 42960 

aataaacaca cgtgtgcatg tgtctttata 43020 

actcagtaat gggatggctg ggtcaaaaga 43080 

gaatcaaaac catacagata aaaactgaga 43140 

atagctatgc tcaaatctct caaatttcat 43200 

atatatagct atattgtgtt gactgatttc 43260 

agaattaggg atgcctcttc tgtcttggtc 43320 

tggaactatc agtgcactgg aaaatacaga 43380 

atctaacaaa tgcagtgtaa acaagtttta 43440 

cctaagtcaa caattttatg actcatcaat 43500 

tttgattctt agcttgttaa tgactttcaa 43560 

aacgtccgtc ctccaaagga cctgggccct 43620 

atgttgtcac tctgctgctc tccacatcct 43680 

gctgccacac ctttctaccc ctcaccctca 43740 

aaaactggag agttctgatg aaaaccttct 43800 

atctatctat attttccttc ttccatatta 43860 

gctaaagctc tgatgaacac atgtaacaaa 43920 

tcttcaaagg agaatctagc tgttcaaact 43980 

aatgatatta taattggttc tttagaatta 44040 

gaaatatata tcttgccacc ttgctttaca 44100 

tcatttgcta catttgttca ctaaaaatac 44160 

aatcaataga ggaaacccat attttcatca 44220 

acatttcatt gactctaaaa tcaattatat 44280 

aaataagcaa tttctttagg gtgcaaattt 44340 

ttttcatgaa gaactcagag agagtggctt 44400 

gttctgtgat tctctaacaa tattttaata 44460 

aaaaacttca ctctcttcct tcaggtgcta 44520 

ttctcagaaa aggctacttt gaatagtcat 44580 

cagacccagc ttttacttga aattataata 44640 

ttctcatatt ataatgtgag ctccatgaaa 44700 

ggaattgtaa tacacacaaa aaaatcaagc 44760 

aggtaaaatt tcaaagtgaa aaaacagttc 44820 

agcactaagt ttaactaatt taatgttcac 44880 

gtttgtttgt ttgagacgga gtctcgctct 44940 
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gtcgcccaga 
ttcacgccat 
gcctggctac 
tcttgatctc 
gcgtaagcca 
tttttcttct 
tggacttcct 
ttgttttttt 
aagacaagta 
atagagaatg 
catataaaga 
gcctagatga 
tcagagagaa 
ccccaaacca 
tagagaagat 
tgtgcatgat 
ttggtcacct 
acaatttcct 
tcttcatcgc 
atgtagccat 
ctctggtcac 
cctttcactt 
ctcttatcat 
caggctttac 
cagcgatctt 
cccacctgac 
catcagagaa 
caatgctgaa 
aaatgattag 
ccctaagtgc 
cagtatgggc 
gtttttaaaa 
atgaatataa 
gggggtaaac 
tattactatc 
aaaaaatgat 
ataggtggat 
gtctctccta 
ctcaggaggc 
agatcacgcc 
aataaataaa 
tgtggtgtct 
atgtaaatat 
tgtaaacatc 
ttctgaaaat 
atttctatca 
gacttaggaa 
aaatgacttt 
tctacagagg 
aatgctactc 
agaaagcaag 
cagactaaag 
tgtgttgatt 
tgataaaagc 
gcgtgttctt 
gtcatgaccg 
gtcaccctgg 
taggggcgta 
tggcacatta 
aatcacaaag 
gactgcatat 
taatattttg 
tatgatatca 



ctggagtgcc 
tctcctgcct 
ttttttgtat 
ctgacctcgt 
ccgcgcccgg 
gttttgtatc 
acatttattt 
tgttccctca 
taaatgcatt 
tgcctgtgtg 
gaaaacgttg 
atagatgaat 
tcctcacatt 
caccatagtg 
cctgtttggg 
cctgctgatc 
ctccttttta 
ctcagaacag 
cctagtgatc 
ttgcagccct 
tgtgccttac 
atccttctgt 
gctggcctgc 
tctctcaagc 
caggatccgt 
aatagtcact 
gtctgtagag 
cccattgatc 
gggaaaatcc 
ctgtggggta 
tcttagtaac 
ataaaaagct 
attgtttggt 
ttttaactaa 
catagagcca 
gccgggcatg 
cacctgaggt 
aaattacaaa 
tgaggctgga 
attgtactta 
taaataaata 
tgaacaatag 
atgaatcacc 
gtgagaagct 
caactctcaa 
tcaaaatact 
taatttagta 
gttagtggaa 
aaaaatttca 
caaagagatc 
tggaggaatg 
taagttgtgg 
taaagaaaac 
atatactctt 
gtaggtattt 
gccaggactg 
atctcctaga 
cgtatctttt 
tttttttaca 
aggctaaata 
taatacatac 
cactctgctt 
actactgtat 



ctggtgtgat 
cagcctcccg 
tttgagtgga 
gatctgcccg 
ccgacagtca 
tgaaataata 
atttaaatga 
caaaccttga 
aatgtataaa 
tgtgtgggtt 
tgcagaaatt 
gagtgaatga 
ctttgtcact 
acagaattca 
gtgttcctgg 
aggaccaatt 
gacatttgct 
aagaccatct 
actgagtttt 
ttacattaca 
atgtatggct 
ggctcccttg 
tctgacaccc 
tctctcttca 
tctgctgaag 
ttgttttatg 
gagtccaaaa 
tatagcctac 
ttttgtaaaa 
acaaactgaa 
cactttagtt 
taatgttgaa 
ttctaataat 
cttagcctga 
tagtgaggct 
gtggctcaca 
caggagtttg 
attagctggt 
gaatagcttg 
ctccacttgg 
aaaagaaaac 
ttctgaaacc 
tataagttaa 
tcttattggt 
cacccttttt 
tcaattgcat 
ggaatcaata 
aaaggggtct 
agataagctg 
atttattaaa 
caccctcttt 
ctacatgtgg 
gatccttgat 
atgagaattg 
ttaggctgtt 
tgccttgtta 
ctcctgcttc 
tggaggaaat 
ttagggtatt 
aagatacaca 
atatttagcc 
ttaactaaca 
ctactataaa 
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ctcggctcac 
gtagcttgga 
gatggggttt 
cctcggcctc 
taggttttaa 
agatatcaga 
tgctaatatt 
aaataattac 
tttataaatg 
tgagatacaa 
ctgatgtgtt 
taaatggatg 
ttcagttttc 
ttctcttagg 
cgatctacct 
cccaactgca 
attcttccaa 
cctacgctgg 
acttccttgc 
gttccaggat 
tccttaatgg 
aaatcaatca 
gtgtcaaaaa 
tcattcttct 
gcaggcacaa 
gaaccctctt 
taattgcagt 
ggaacagaga 
ttgcagttta 
atggaaaaac 
tcttctcaaa 
attaataatg 
tctgtatgaa 
gagaaactct 
ccagataggg 
tctgtaatcc 
agaccagctt 
tgtggtggca 
aacctgggag 
gcgacaagag 
gaaaaaagaa 
tgtgaactta 
attttctgta 
tcatgtggta 
ttttttttga 
tccacgtatg 
gggtttatca 
tgatccagat 
caaagtgcag 
ggctattgca 
aagttttttt 
gtaggctgac 
attttagtgt 
ggacacctag 
tcctaaacta 
acctcaagac 
tttaacaccc 
agtgtttatt 
ctagttctca 
ttaatcaatg 
ctattaaatt 
ctatacagaa 
atgttcaagt 



tgcaagctcc 
ctacaggcac 
cacagtgtta 
ccaaagtgct 
gtgaaaatgc 
catggaggga 
aatatctgat 
caaatgtaga 
tatatatata 
aaagagacag 
tattgaaaaa 
aaacaaatgc 
aagaaataag 
actgacagac 
aatcacactg 
aacacccatg 
tgttactcca 
atgcttcaca 
ttcaatggca 
gtccaagaac 
gctctctcag 
tttctactgc 
gatggcaatg 
gtcctatctt 
agccttttct 
ctgcatgtac 
cttttatact 
tgtaatcctt 
ggcctgtgtt 
ctagtgtagt 
attaacactt 
ttttatttgt 
aatactgggc 
gaaaatccaa 
tggctaattt 
cagcactttg 
ggccaacatg 
catgcctgta 
gcggaggttg 
cgaaactcca 
aaaaattgtt 
ttgtaatcac 
agtattctca 
ataccagtta 
gttttcaagt 
ttttccagtg 
attgatattt 
cccagcaggg 
tgagaagaga 
ttacagagta 
aatggtcttt 
ggcatgacaa 
gtgcataact 
gttctcttgc 
taagcatctt 
agagttgatt 
tgaaacagta 
ttgcttcaca 
aagctactac 
tattatttag 
gcctcttatg 
aatgctacag 
ggttttccag 
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agatgactaa 
gtctggttct 
tgggcagcaa 
caatggcagt 
tgggaactga 
tgggatattt 
ttcttttatc 
tggtctgata 
tgcctaaata 
gcaaccaaac 
ttttttttaa 
cccttgggta 
gaatagaaat 
ttttctttac 
tggagtctta 
cacaaaagca 
attaagccat 
taagagtgtt 
acagctaatg 
ttaaaattgt 
gcttcttagg 
ttaattaatt 
gacaaggaat 
acatgttact 
cagattttca 
atttttaatg 
aatgtagaaa 
taatcaacta 
atcttgcaaa 
ctgtacgtct 
aaatgttttg 
gaaagcttgt 
atcagcagta 
tagaaacaaa 
aaaggctaca 
gtatttttat 
cacaattgaa 
gaatatcatg 
ttgtgccctc 
gatttgtgat 
tttttttgtc 
tattttaatt 
agatactttt 
aaactgtatg 
taaagcaatt 
gcttaaggat 
gtgtctggta 
acactgttat 
acattttttt 
tcccagcaat 
cttcaaacac 
atcagagaat 
agcaaattgc 
gcaatgtatt 
aaaaataatt 
ttgattaaaa 
tgtaagatat 
acattgattg 
gtacaaaagt 
caacctaata 
ttcaatttta 
tttttatttt 
cttggcatgg 



cagtttttgt 
atccaaatct 
caatttaata 
ggcacaacag 
gtgacacagc 
ctttatctta 
ttcctctttt 
ctctatttga 
actgtatact 
ttaaactttg 
aatatgtatt 
aaagttatct 
agtgacaatt 
ttaaatgaaa 
gacacagaga 
gatttcttgg 
tcttcaacac 
actatagatt 
tgaattttcc 
cttaatatcc 
ctgcatttta 
tccaggtaca 
aagttttgtt 
ataatgcaca 
tttgatacta 
tcatatttta 
accaatgaac 
cctgaaaaga 
tcatgattcc 
ctctctaatt 
aacattttta 
aacaaacctg 
ggagctcttg 
ttctgtttta 
tataataatt 
ggagaaaaat 
taaacaagca 
aattatcctg 
acatttttaa 
gtcaccttta 
tgaattataa 
tctggtgaag 
agctatccaa 
tatagttgta 
aaaaaataag 
gaaatatttt 
agtgaggaga 
acaaatactt 
tcaacaaata 
tacccccaaa 
tgaacaaaaa 
aatatatctg 
tttaatttga 
tgccacaaac 
tgaagcacta 
tacaaaaatg 
tgtttgatat 
attaagggta 
aatcttgttt 
ttattatcat 
agaatctact 
ataatcaaga 
taccatgtaa 



aaaattgggt 
gcttttagaa 
tttacctttt 
aagcaggtga 
gtccttactt 
tgacaacatc 
ggttaggcaa 
aaactgtatt 
tccttttatt 
acacctttgt 
tatttattct 
cagaaattct 
gatgtaatca 
atgaagtagt 
gatgaaatga 
ttttaaattt 
acttcaaatg 
aaatagtggt 
ctgacagtca 
atgtgaaggg 
ttaatatatt 
gacaaatata 
tatattttgt 
aaacaaatat 
taatgtatgc 
attagcttgt 
catattggca 
gttgctctca 
atcatatcaa 
gttccttttg 
aaggtcttgc 
taccactaat 
tttatggtta 
atttgcacag 
ttctttcatg 
tatattttgt 
cttatatggg 
gatataattt 
ccaacttcct 
ttatacctta 
atatatttta 
atatttttgt 
atttttaact 
taattttgaa 
gaaaattatt 
gaatgagttg 
aaacacctaa 
atgttttcac 
tctctaaata 
agaactgaga 
aggattaatc 
tgcttcataa 
agataaaact 
aagataaatt 
aagaatcaat 
ctaaaattaa 
atctacattt 
taaagttttt 
ttttgccatt 
attaaggaga 
gaaggcataa 
caaagtaaac 
cagaaacctt 
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tatattctta 
acagtaactg 
agcactattt 
tctgtagaga 
ccttttagag 
atttaatctg 
agcattgtgt 
ttagaattgg 
ttctagttat 
aacccaaatg 
atttgtaatt 
ttggagatag 
tagtgtattg 
gagagaatta 
cttgcctaac 
tctctctttc 
ctggggggaa 
tattatgtga 
ggctaatgtg 
agaaaatttt 
ttcccaatta 
aaaaaataaa 
tatatttctt 
aaagttactc 
acctttacat 
gtgatcctaa 
tacataagaa 
gcattcacat 
tgtatcttag 
ttatgttccc 
taaacatttc 
gtttgaaaac 
tatgctactt 
attagaaatt 
ttatgatggt 
ttattgttgt 
gttattggct 
tgttgcatgt 
ccaagtgtta 
agtcaatcta 
aactaaattc 
cttcaatttt 
cagaaattat 
tgtagttgta 
ttactgtcta 
tttgttttcc 
ccttggtata 
cataccagtg 
cagtctcttt 
gggccacata 
agaaagaaaa 
cagtgggaaa 
atttaaaatt 
ggtaaaatta 
atataaatat 
cacacaaaat 
aaaagagctg 
ctgaaaataa 
aaaagtaaca 
caaaattttc 
gaaacttgtg 
acatgatgta 
tccatactgg 



agtgtcagtt 
aaaaagaaag 
gtgtatttgc 
caagagtgaa 
gcccagcatc 
tgggatagct 
tcaatttcta 
tacaaacaca 
gagaattgct 
cttgacctgt 
taatttttta 
aatagaacat 
ctaattttag 
ttatcttcat 
gtcatttggc 
taattcatac 
tattggagac 
tagtgagatg 
atcatttata 
gatatatggg 
taaaagcaca 
atttattaag 
ctcaataaat 
accatataat 
agattaaata 
tacaatatta 
atagaagcaa 
gtgttttatt 
tattttgatg 
agaagtgtaa 
tatggataaa 
aatcatttta 
tgatctctca 
gaagaatata 
gtcttacatg 
tttgtaactt 
ttcattgcaa 
tttaatgttg 
ttaagcactg 
tttctgccca 
ttatcattaa 
ctcttaggag 
aatattatga 
catacaatac 
tggtggccat 
agaaagtcag 
ttaattctat 
tatggtgaaa 
ttcttacaaa 
tgtttttcca 
taaatattta 
aatttgtagt 
aagaggcacc 
ttgaaataca 
caaagactat 
attagcaacc 
atatagattg 
atttaatatt 
gcacttattt 
ttatgctttt 
tgaagagttt 
ttaaatacaa 
aaacgtgtct 



ggttgaataa 
aaggcaattt 
ctggttttgg 
gcagaagagc 
atttcatctg 
ctttatatta 
gcaagtagag 
ctctatattt 
gtggcttgca 
ttgtaatctc 
attgtgcaag 
aaatggacat 
gattactgtt 
ttacaatatt 
cactcagtag 
tgaagtggtt 
tatgaattaa 
tatcctaaca 
tgcatgataa 
aaacctttat 
catatttata 
gacctgctat 
gtatatgact 
aatatgaata 
ttttatgaac 
atacatgtca 
tgcttacttg 
ccatttgatt 
accgacgtat 
ttgctgaatt 
tttttatcca 
ctataactct 
aaggaagttt 
tttaacaaat 
aagaatattt 
tttcttatgc 
tattttatgg 
taaattactt 
ctttcccact 
tttccactgt 
attgaacata 
attgaaaatc 
ttcaatttag 
ttatacatta 
agggagaact 
gaagatgctg 
ggatttatta 
tcatcatgta 
gaggtttttt 
ctatgagtca 
aatttaaata 
aaaaagagta 
ctacaatttg 
agaaattaat 
aagtaaaaaa 
aaaaatatga 
gcactttcat 
taataggttg 
tttttcttac 
attcagctat 
catggtgttg 
atagttcctg 
tagtttttgt 
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ggttcttagt tcctacaaaa tcacaaagtg agaactgcct gcatttaata actggtaaaa 52560 

tacaggcatc caccagatgg gatagaagct gtctctaact aagtagaatg tcattagcct 52620 

tgatcatgaa ggaaagtgta aagtacaggt gaaacgtttt ttgtaataga aagttataca 52680 

gccattaaag tgaggatgtt cctagttaag aaaaaggtaa ttcattttat gtaatatatt 52740 

atgtgcaaag atatatccta tgctgtatac agtgtgatat cttgtaaaat attagggata 52800 

aggggaaata aaacttgtta ttaatagcta tgtccatatt taaatagtca ttgctattgt 52860 

gagaaaataa ggcaacaaaa actgtgtatt ttcagtttag tccagatctc aggatgtata 52920 

attacaggat acttatttat taatattgga atttaaatat gaaagtaagg aaaagcaagg 52980 

aaaaagcaag gtaagaaaaa gaagaggtcc ccaattctgg tttccagcac agaatgacca 53040 

atgaacaact tctcagagaa aaatgcccat gtgaaaattc cgtaacccat gggtgaggtg 53100 

gggcacactc ttggagcata aaaactggaa aaaaccacat tagaattgtt aagaggagca 53160 

gtttcacttg cttctttccc agccaaacac agtgtcacac tgaaagagat tccctgggcc 53220 

tgtggtttcc ccagtgaaga aaagagagcc cacggcagac atccagcttc cctatgttcc 53280 

agagaacttc ccaagaggcc cattctgtct aaccttgtgg gtaataacgg aagacttggc 53340 

aaagcttggc caccccagtc agctcataac aaagttaagg ggtgtagctt acggcaacca 53400 

gcacccgatc ttggtggtgg catcatgttc tagctggcag ctttatgtga ccagagagcc 53460 

cagccactag cattgctcac atgtggagct gagttggcag gccctttcgg tcaggaggca 53520 

tgttgaacag ttctgtctgg ctgtggtacc agtcagtaag ctggaccagc tgcagagcac 53580 

agcctagagc accacccaac cacaaagcac aacttgcagc ccaccccact caggtagaga 5364 0 

tccaagccag caaccccact caaccgctgg gcatagcctg cagtcccatc tgacgfacgga 53700 

gtctggtcaa ttatctctcc aaactacaga gcacatccag taaaacttcc caattatgga 53760 

tcagcttgca gccccaaaaa actgcaggaa atagccagtg gccctatctg gccaagagat 5382 0 

ctggtcagtg atctcaccta aatttggagc aaaggcagtg accccatcaa tctgtggacc 53880 

acagccagca gccccactcg aatacatacc acagtgaatg gtcttacctg attatggagc 5394 0 

ccagtcagca gccctgagtg attttggatc ctagccagca tactcaccca cctcagagca 54000 

caggcagcag tcagaggtca tgtgactaga gtcagctttc aaattcatca agtcctggtt 54060 

ccacctctta ctccctatga aatctcaggc aaattattta acctctactc ctatgcccca 54120 

atttgtttgt ttgtttgttt tcacagtact tatcatcacc tgccatatat ttgtttgtgg 54180 

gtgtatttgt catctatctc ctccagcaaa aaacattatc aataaagtag tttttaattt 5424 0 

tttttgagca acagctatat tgagttatat agtggctata ctaatttaca tttttgccaa 54300 

taatgtacga gcattccttt ttctgtgtat ccttgcaaac ctctgttcat ttttttgtct 54360 

ttttaatgac aatcattaca aattggttga ggttatactt cattatggtc ttgatttgca 5442 0 

tttcatttat gattagtgaa ttaaaccttt ttttcatata cctgctgaca atttgaatat 54480 

cttctttttg aagcttttga taaaatccaa catctcttca taataataat aataaaaaaa 5454 0 

tactcaacaa actaggcatt gaagagacat acctcaaaat aataagagcc atctatggca 54600 

aacccacagc caacatcata ctaaatggga aaaagctgaa agcattccac ctaataaatg 54660 

gaacaagaca aagatgtcct ctctcatcat tcctatttca catagtactg caagttctag 54720 

ccagagcaat aaagccagag aaagaaatat aaggcatcca aataggaaaa gaagtcaagc 54780 

tatctctctt cactgataat atgattttat acctagaaaa ccctaaagac tacaccaaaa 5484 0 

agctcctaga tctgataaac aactttatta aagtttcagg atacaaaatc aatgtataaa 54900 

aatattagca tctctatact cccataacat tcaagctgag agctaaatca agaatgtaat 54960 

cccatttaca atagccacac gcacaaaaat aaaacaccta ggaaaacaac aaaccaagga 55020 

ggtgaaagat ctctacaagg agaactacaa aacactactg gaagaaatca tagatgatat 55080 

aacaaatgga aaatcatccc atgtttatga attagaagaa tcagaatgct aaaacagcta 55140 

tattgcctaa agcaatctac agattcaaca ctattgccat caagctaatg acactattta 55200 

catagaattt gagaaaactt gcctaaaatt tatatggagc caaaaaagag actgaatagt 55260 

caaagcaatc ctaagcaaaa agaacaatac tggcagcatc aaataaccca acttcgaact 55320 

atactacaag gctacagtaa ccaaaaaagc atggtactgg tacaaagaaa gacccataga 55380 

caaaatgaaa cagtgaaccc agaaataaag tcagacatct acaactgtct gatctttaat 55440 

aagttgatag taataaacaa tagggaaaga actacctatt caataaatgg agccaggata 55500 

actatttagc catgtgcaga agagtgaaac tggacccata cctgtcacca catataaaaa 55560 

ttaactcaag atggattaaa gacttaaata taaggcctaa agctataaaa atcctggaag 55620 

aaaactgagg aaataccact ccggacattg gcctcagcaa aaaatttatg aataagtctc 55680 

cagaagcaat tgcaataaaa ataaaaattg aaaaatggga tgtaattaaa ctaaagagct 55740 

actgcataga aaataaatta acagagtaag caatctaaag aatgaaagaa aatgttcaaa 55800 

aatgataaat ccaacaaaaa tctaatatcc agaatatata agaaacataa acaattcaac 55860 

aggcaagaaa cctcccccta aaaccattaa aacacatgga cacagggagg ggaacatcac 55920 

acactggggc cttttgtggg gtgggaggct aggtaaggga taccattagg agaaatacct 55980 

aatgtagata acggattgat gggtgcagca agccaccatg gcatgtgtat acctgtgtaa 56040 

caaacctgca cgttctgcac atgtacccca gatcttaaag tataataaaa aataggcaaa 56100 

agacataaac agacacttct caaaatgaag catacatgtg gccaaaaaat atattaaaaa 56160 

tacagtatca ttaatcatca gagaaatgta tatcacaact acaatgagat aacatctcac 56220 

accggtcaga atggctatta ttaaaaagtg aaataaataa cagacattgg tgaggttgtg 56280 
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gagaaaatgg aatccttata cacttttggt gggaatgtaa attagttcat ccactgtgga 56340 

agcagtttgg aaatttctca aagaacttaa aacacaagta ccatttgacc cagcaatcct 56400 

attactgggt ataaacccaa aggaaagtaa ataattttac caaaaccaca tgcacttgtg 56460 

caatcattgc agcactatac actgtagcaa agacatggaa tcaacgtaga taaccatcaa 56520 

tgatgaacgg gataaagaaa atatgataca tacacactat tgaatactat gaagccatga 56580 

aaatgaaatc atgtcttttg cagcaacata gatgcagctg aaggccatta tcctaagtga 56640 

attaatgtag gaacagaaaa ccaaatgtgc ccatgttctt attaaaggta tgagagaaac 56700 

attgaataca catgaaaaat aaaggagaac aagcaacact gggggttaat agaaggggaa 56760 

gagctgggga gtgcagactg aaaaatgatc agaccaatgg aacagaatag agagcccaga 56820 

aataagacca cacatctatg accatctgat cttcaacaaa gctgacaaaa acaggcaatg 56880 

ggaaaaggac tccctattta ataaatggtg ctgggataac tggctagcca tatgcggaag 5694 0 

attgaagctg gacccccttc cctataccac acacaacaat caactcaaga tggattagag 57000 

acttaaatgc aaaaccccaa actgtgaaaa ccctaggaga aaatgtcggc agtatcatcc 57060 

tagaaaaaga aatgggcaat gatttcatga caaagacaca aaaagcaatt gcaaaaagaa 57120 

aaaagcaaaa attgacaagt aggatctaat taaacttaag agcttcttca cagcaaaaca 57180 

aactgttaac agagtaaaaa gacagcctat ggaatagaag aaaatatttc caaactatgt 57240 

atctgacaaa ggtctaatat tcagcatgtt taagaaactt aaatttacaa gagaaaagca 57300 

aacaacacat ttaaaagtgg gctaagatgc tggcaacagt ggctcatgcc agtaatccca 57360 

gcactttggg aggcagaggc aggtggatta tctgaggtca ggagttcaag accagcctgg 57420 

ccaacatggc aaaaccctgt ttctactaaa aatacaaaaa ataactgggc atggtggtgc 57480 

gtgcctgtaa tcccagctac tctagaggct gaggctggag aatcacttga actcaggagg 57540 

tagaggttgc agtgagccga gattgcacca ctgcactcca gcctgggcga cagagtgaga 57600 

ttctgtctca taaataaata aataaataaa taaataaaat acataataaa agtgggcaaa 57660 

gaacataaac acttctcaaa agaagacata catgcagcca acaattatat gaaaaaaggt 57720 

caatatcact gattattaga gaaatgcaaa tcaaaaccac aatgagatgc catcttacac 57780 

cagtcagaat gactattatt aaaaagtaaa aaaataacag atgctggtga ggttgcaaaa 57840 

aaaaaggaac atttttatac tgttagtggg agtgtaaact cgttcagcca ctgtggaaag 57900 

cagtatggca attcctcaaa gagctaaaag cagaactacc atttgatcca acaatcccat 57960 

tactaggtat atacccacag aattataaat cattctacca taaagataca ttcatgtgac 58020 

tgttcatttt ggtactattc acaatagcaa aggcatggaa tcaacttaaa tgcccatcaa 58080 

tgacagtttg gataaagaaa atgtggtgca tacacaccat gtaatactac atagccataa 58140 

agaagaagaa gatcgtgtct tttgcaggaa catggatgga tctggaggct attatcttta 58200 

acaaactaat gctgaagcag aaaaccaagt accgtatgtt cttatttata agtgggagct 58260 

aaatgattag aacttacaaa cacaaagaag gaaacaacag acactggggt cttcttgagg 58320 

ggggagggtg ggaggaggga taggagcaga aaagataatt atgggtacaa ggcttaatac 58380 

tctggtgatg aaataatctg tgcaacaaac tcccatcaca catgttcacc tatgtaacaa 58440 

accttcacat gtatccccaa aactaaaata aaagtttttt tttttaaagg aaaactgttg 58500 

ggtactatgc tcactacctg ggtgatggga tcatttgtat accaaatatc agtgacattc 58560 

aatttaccta tgtaacaaac ttgcacatat acccctgaaa ctaaaaatag aaaaaattat 58620 

taaaatacaa caaaaaagaa aaaagatgaa aaagagatat ctataatcat ttgcacattt 58680 

caatcagatt tttaaaaatt tatttatttt gaaattgagt tgttcaaatt tcctatgtaa 58740 

cctgaatatt agtcccttgt cagatgaatg atttgtaagt attttctccc attgtgtagg 58800 

ttgtcttttc actctataga ttacttcttt tgctgtgcag agatttcttt tgtttgatga 58860 

aatcctgttt atctatattt ggttttgttg cctgtgcttt agaggcctta tccaaaaact 58920 

ctttacctag accaatgtct aaaagtattt tcccagagtt ttcttctagt agatttatag 58980 

tttcaggtca tacatttgtc tttaaaacat tttgattaga tttttgtata gtagtctatg 59040 

atagtctttt atatgtctga ggaattgatt gtaatgtcac ctttgtcatt tctgatcgtg 59100 

ctgatttgga tcttctttct tattttcttt gttaatctag ctagcagtct atccattttg 59160 

tgtatccttt caaataaact tttggtttta ttggtttttg gatgtatttt ggggtctcag 59220 

tttcattcag ttttactctg atttagttat ttattttctt ttactggctt tggagttagt 59280 

ttgttgtttt atagttcctc taggtgagat gttagatcat taatttgaga actttctaac 59340 

tttttgaggg aagtgtttag cgctgtacac tttcctttta acactgattt tgctgcatcc 59400 

cagagatttc ggtaagttgt gtctctgttt ttatttattt caaatatata ttttgttaat 59460 

ttctggttta gttgtttgcc caaaagtcat tcagtagcaa attgtttaat tcccatataa 59520 

ttttgtggtt ttgagagatc ttggtattga tttttatttt tattccagtg ttctccaaga 59580 

gggagtttga catgatttta tttcttaatt gattgacact ggctttatgg cctagtatgt 59640 

gatcaacctt ccagtatgtt ccaggtgcag aggacaagaa cgtatattct gtggttatgg 59700 

gtggagtact ccgcagatgt ctattaggtc caattaatca agtgtcaaat ttaagtccag 59760 

aatttattag ttttcagacc ttgatgacct aaggctgtaa ttggggtgtt gaattcccct 59820 

actgctattg tgtggctttt taggtctttt tgtcagtcta gaagtacttg ttttatgaat 59880 

ttgggtgctc caatgtgggg tgtgtatatg tgtaggatag ttaagtcttc ttgtttaatt 59940 

gaaccctatg tcattaggca gtgccctttt tgtcattttt tccagttttt ggtttacagc 60000 

ttcttttatc tgacataaca atagcaaccc ctgctgtctt ttatttttca tttgtctagt 60060 
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agattatttt 
atggatggat 
aaatttcttg 
ttttactttt 
ttaacattga 
tgcttgttgt 
ttaagtaatt 
taggtttttg 
gttttgaaca 
cacttttaat 
attgcctact 
tacttatgat 
atgttctttt 
atttctttca 
gaatttcctc 
aagaataact 
agtatagcat 
tattcaagct 
agtttataaa 
gctgggaggc 
ttcacatggc 
agatctcatg 
aacaccccga 
attcaatata 
tcccaaattt 
caaagtttta 
aaatgtccaa 
gttatttatt 
aggaagaaat 
ggcagtcatt 
ggccacacta 
tgcagggtgc 
cagatgcatg 
cttttcttac 
cacattttcc 
agacttctgc 
ccaagcctta 
aagcttgggg 
cctgactgga 
ctgcagacct 
tgggagtggc 
gctattaaca 
ataaaatggg 
ctctgctttc 
aagccaggct 
ttcatctctc 
tctctttgtt 
tccatctgag 
accaccactc 
agccctccaa 
ttgggtatct 
cacatcactg 
attcaccgtt 
tcatggtgga 
ggggaagtgc 
ctaggggaat 
gtcccacctc 
ccaaaccata 
tctttgtctt 
ttgaatttga 
ctgagaaatt 
tccatttgga 
ataggcctgc 



tcatcctttt 
cttatttttt 
aagacagcac 
ttttccattg 
agagtattat 
tttctaaatc 
ttctttagaa 
ttttgtgggt 
gataatgact 
atgccctaca 
tctaaactgt 
acaagtggtt 
acgagggaat 
gactgaagaa 
agcttttgtt 
ttaatgcaca 
accatttcct 
ctgtattact 
gtgaagaggt 
ttcaggaaac 
gacagaggag 
agaactctat 
tgatccattc 
aggcttaggt 
catgttcttc 
actcattcca 
aatctcatct 
tctaagatac 
tggccaaaat 
aaatcttaaa 
atgcaagggt 
agccctgttg 
gcacaagctg 
agatccatta 
ctctgcattg 
ctggatatcc 
attcatgtct 
cttgcaccct 
gttggagtgg 
gggtctggcc 
tgtaataaag 
tgtagctcct 
tttttctttt 
cttttaaata 
aattcttgaa 
tcaaattcaa 
aaagcatagc 
accacctcag 
aacaagtctc 
actgttccaa 
ataggaatgc 
taaagaacta 
ccttaggctg 
agggcaaagg 
tacacacttt 
ggtgctaaac 
cagcagtagg 
tcagtctcca 
tgatttttcc 
ttggtgagca 
ttcagttatt 
tatttcaatt 
tttacttttt 



actttgagcc 
atctaacttg 
atacttgagt 
agcagtcact 
tgttaggtaa 
ctttatttct 
gtatgttttg 
accgtgaggt 
taattttgat 
cacattttga 
agttattgat 
tatttaccac 
tttatacatt 
ctctctttag 
tttctgcgaa 
taatattctt 
cttgacctgc 
ctgttctcac 
ttaattgaca 
ttacaatcat 
agagggtaaa 
cataagatag 
acctcccacc 
gggaacacag 
tcacatttca 
gcacaaactt 
gagacaaggt 
aagggagtta 
aaaggagcta 
gctccaaaat 
tgctctctca 
gctgctttca 
tgagtggagc 
cgcaggaccc 
cactagtaga 
aggcatttgc 
tctgcacacc 
ctgaagcaat 
ctaggacgca 
cccgacatca 
gtctctgaaa 
ctttatttat 
ctaccacatg 
taaattccaa 
tgctttgctg 
agttccacag 
aagagttacc 
cctggacttc 
taggaagttc 
cctctgcttg 
cccacttctc 
actgagattg 
tataggtagc 
gaaaacaagc 
taaataacag 
cattagaaat 
agacggcaaa 
ttaaatgtga 
taatttgatt 
atggtctttt 
atttcaataa 
atgcaaaagt 
tttttttcag 
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tatgtgagat 
ctactctagt 
attgattttt 
cagcgtcttt 
gagtttacta 
tgtgttctat 
attccatgct 
ttacaaaaaa 
tgcaaataaa 
cttttggtgt 
ggttttaata 
aattatagta 
ccaatgtttt 
catttcttga 
aaaaaaatct 
ccttgaaaga 
tagcttctgt 
attgctaaac 
cacagttcca 
gatggaaggt 
ggggaaagtg 
cactagggga 
aggccccacc 
agtcaaacca 
aaacgcaatc 
aaatgtccaa 
ccacaggtaa 
caggcagtgg 
catatgccat 
aatctccttt 
aggcctgggg 
caggctgatt 
taccattctg 

agtgggaact 

ggctctctgt 
atacaacctc 
cacaggccag 
ggcccaagct 
ggtcatcatg 
ttttttccta 
tttcttggag 
gcaaacttct 
ttcaggctgc 
tttcagacca 
cttagaaatt 
ctctctagag 
tttactccac 
actctccata 
caaactttct 
ttacccaatt 
tggtaccaat 
agtaatttat 
atggttgaga 
acatggcaag 
atattgtgag 
cacccccatg 
tttagatgaa 
tatttttctc 
acagtgtaac 
ctacctagat 
gtatagtttc 
taatgtactt 
ctcctctgag 



agatctcttg 

atatgcctgt 

gtttgtttgt 

taattgtaga 

ctgccatttt 

cttactgtat 

atgtttttgt 

aatctatagt 

agcatcaaaa 

ttcaatgtac 

gttttgtatt 

tgacagtatt 

cctgttacat 

aaaacaggtc 

ttatttcacc 

ttttttctct 

gagaaatctg 

aaaattacct 

caggctgtac 

gaaggggaag 

ctacacactt 

atggtattaa 

tccaaaactg 

tatcattctg 

atgtcttccc 

ttcctcatgt 

gcctgtaaaa 

ataaatgctt 

gcaagtccaa 

gactccatgt 

caactccacc 

ttgagtgctt 

gtttctgaaa 

cggtgtgggg 

gaggattctg 

taaaacttag 

acactatgtg 

gtaccttggc 

ttctgaggct 

tttgccctct 

gcattttccc 

gcagccttga 

aaatttttca 

tctctttgaa 

tcttccacca 

caggggcaca 

ttcctaataa 

ttactattag 

cctcttcctg 

acaaagtcga 

tttatgtatt 

gaagaaaaga 

aggcctcaga 

aggagaaaaa 

aactctaaca 

atccaatcac 

atttgggtgg 

ttgctgattt 

ctgagatatt 

gctattgtgt 

taggcctttt 

gattgtgtcc 

aagttagttt 



aaaattgcag 
ctttataggt 
ttttcttttg 
attttgtcta 
gttatttctt 
tactctgtag 
gtatctgtca 
ataacagttc 
accaactcta 
atgtttttat 
ttattcttca 
ctgattttga 
gttagtaccc 
tggtgttgat 
ttcatgttcg 
tatcatcttg 
ctgaaagcca 
tagcctgagt 
aggaggcatg 
caagcacatc 
ttaaacaaac 
accattagaa 
ggggattaca 
tcccggtccc 
aatagttccc 
ccaaatgtcc 
ctaaaaacaa 
ctgttccaaa 
aatccagaag 
ttcacatcca 
ctcatggctc 

gtggcttttc 

gatagtgtcc 
gctccaaccc 
cccctgcagc 
gcaaaagttc 
gaacttgcca 
actttttagt 
acacagagca 
gagcctgtga 
cactgtcttg 
attcctccct 
aactttcatg 
aatagagaag 
gatacctgaa 
atgccaccag 
gttctacatc 
tattttgttc 
tcttcttctg 
ttccacattt 
agtccattct 
ggtttaatcg 
aaactaacaa 
cagagcgaag 
caaggcaaca 
ttcccaccag 
agacacagag 
gaatattatt 
actgtttgta 
ttctccagaa 
tatctctcat 
cataattttc 
caaatgttct 
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atcttcaagc ttactgattc tttcctctgt catcaaaact atgggcacgg tggctcacgc 63900 

ctataatccc agcactttgg gaggccgaga agggtggatc atctgaggtc aggagtttga 63960 

gaccagtctg gccaacatgg tgaaacccca tctctactaa aaaaatacaa agaattagct 64020 

gtttgtgatg atgggcacct gtaatcccag ctacttgtga ggctgaggca ggagaatccc 64080 

ttgaacccgg ggggcagagt ttgcagtgag ccaagatcgt gccatcgcag tctagcctgg 6414 0 

gcaacaagag cgaaactctg tctcaaaaaa agaaaaagga tagaggaggt tgggaagagg 64200 

gaggcaggga gtcagcgtca tagcaatggg tgagaaagat gaaagatgtg tggccattgc 64260 

tggctttgga ggtagaatgc gggcagcctc caaaacctgg aaaaagcaag aaagtagatt 64320 

ttccctgaga acctccaaaa agaaagcagt cctgccgaca ccttgattct agcccagtaa 64380 

gaagcatttt gggctttgga catccagaaa tgcaagacaa aatttgtgtt gttttaaggg 6444 0 

attaaatctg tgatagttta ttatggcagc aacagaaaaa gaatatacct gtcaaaactg 64500 

ggttcccatc ttcaaaggct tcccctctgc ttgccgtcca gtccacagct ccacttgcag 64560 

gtgcagtaca gttggctttc gagcagggct tggattgaaa attcagcatg taaggaagaa 64620 

gtctaaagtc cagacgaacc ttgaacaccc cttaattgcc cctcccgtcg agagcaggga 64680 

ccaggatcca ctgagtgtag agcagagcaa cattcctcaa ggtgtcctca cccccaggca 64740 

ccctcccaca gggctcaacc agctaacctc tttccaccct ttgaattgtc tctctctccc 64800 

ttcaggcata taattaaatt tgcataaatt caaagagcat atgctgcaat cccattgtaa 64860 

tttcagattt gtttttattt cctgagtttg ctaaggaaat aataaaaaca aaacctcctt 64920 

aaaaaaaaaa aagatttcta atgagttttt aagttcagtt tttgtattct ttatttctgg 64980 

gattaatatt tttaaacatt gtttttattt ctctttccaa tttcttattt tcttcctgaa 65040 

ttattttcct ttttttttag ttttctgttt tcttataatt tcttttttta acatttttct 65100 

ttctttaaat atttttttaa attccttatt tcctcagaga tctttgttct gacctgtaat 65160 

ttcttgaact tctttaagag gcttatgctg aattcattgt cagacattta atagatcttt 65220 

aattgttcag ggtccattgt tgagcctctg ttgatttctt ttggtggtgt catatttccc 65280 

ttggttttta taatcttcgc atatttatgt cgatgcctgt gcatttgaag agagagacac 65340 

ttccagtttt cgcaggtgtt ctttgttggt cttagatctt tagttcttag tattcttttt 65400 

ttttttttta agacagagtc ttgctctgtt acccatgaat gcagtggtgt gatcttgggt 65460 

tactgcagcc tctggctccc aggttcttcc agcgattctc ctgcctcagc ctcccaagta 65520 

gctgggatta ccggcataca gcactacgcc tggctaattt ttgtattttt agtagagatg 65580 

aggttttacc atggtggcca ggctggtctc gaactcctgg cctcaggtga tccacccgcc 65640 

ttggccaccc aaagtgctag gattacaggt gtgagccact gtgcctggcc agtattcact 65700 

cttaaatgtt gacttgttgt tgcttctgct ggggtaggct tatagtgacc actgcaacta 65760 

aatttctctt ctgtcattgt ttcctagtgt ggggaagtct tattttgtgc actgaagctt 65820 

aaatactgtg ttgggactat attgctgtgc tgtcattgtt tccaattccg gaaagttatt 65880 

aatcccagtc ttttaattgt tttggggcca ggctggggga ggcttcatga aagaaccttg 65940 

ggtttgtgga aaatccggct aagaattctt gctgtactgc caactgtgct tcctgtattt 66000 

ctatgacact aatcggtctg ttaaatgtgg catctctact aattacagtg ctgagtagcc 66060 

accaggctcc atgcatcagg tactgcattc agtgtcctat tctttgttct cagttcacct 66120 

cagatgattc aattctccta gcactcccaa aggtttttat gagagaggac cagagtggat 66180 

tactgatgaa gattcctaag ctggagggga gactgaacat ccaattccat ttcccccctt 66240 

cctccttagc agacataggt ctagggaaat tctttgtgag tggcattata ctagcttggg 66300 

aaagggggta gtgcaaccgg aaatgaccgt ttcttttaca ctccacaact ttccttgatt 66360 

ctgcacttct cctctgagtt ctgttgtatt tacaatggag ctcccatctt tgaatagctt 66420 

ctagttttat ttttatgggg acagtgatgc taggggattt tctattccat catctttctt 66480 

ctttctgact ttaccctcca aaagtgtaat cctcaaaatt tcaagatttt tttttgtatg 66540 

ccttccatgg agctcataca acaaacaatg atacatttga aattatgaca atatgactga 66600 

agcataatta gtttcctaat cccagagtcc tttaattaaa cacaagaggg atgattccag €6660 

aagtaacact catcactgct attccttagt ccctctgaga tttgctttaa gagcaggtct 66720 

ccacctctca catcttgcta taaatatgac tgcattttaa agcttagaga agcacctaaa 66780 

aaccttttta atcaaatatc aaggaaatga gtatatatca gaaccagaat taagtccttg 66840 

catttttatg ttctttattt ttggttaaag gggaaatata ttaatgtatt aactgaatct 66900 

cacaaaatgt ttgcattttt attattaaaa atcatttttt tcttttactt tgcatatcat 66960 

tggggggcat gaatcagttt ctcctctgaa aaattctcag tgccttctct cagaggctgt 67020 

gagcaaagaa gtaggaaagg aggttagcta tgaaaagtgt taactccatt tgtaacaata 67080 

ctcattacat gcttactata tgcaaatcat ttttctaagc actttaaatg agtaaactat 67140 

attattcctc ctaaaaactc catttctcat tttatagctg ggaacactaa cgtaagacag 67200 

tttagataat ttgacactgt ggtggtggta cgatttgtta ctgatgaagc tagaatcaaa 67260 

atcactaatg gggattagat tcataaaatt gttacttttt gaatactaaa atatttactc 67320 

agacattcag tatattagag taggcccagt atcatgtaat ctaatttatt ttgcaaaaat 67380 

tgcaaatgag cgagatatta accaagttca attagaaagt tagttggtgg cagagctgag 67440 

cataggttcc agagccaccc agaacttttg tttgcttggg agaatctctg agcacacttg 67500 

gatagcttag ggaattgctg cttttgggta tcactggtca tttgcatctg atcataggta 67560 

ctagactgct gacaagttgg ccattagtgg acaggtatat tgagagttga tttaagtgtt 67620 
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gcatctgtca aggttaacac gaaacattct 
aatccatctg tgtactttac tttccttgca 
ttccacttag ctcctagcac acatcatttg 
ttagaacatt tccaatacca ttaaagaaga 
cctcatacat tctcctttcg ctgaaagtaa 
tcttcttcct ctctagtttc caacagcata 
aaaactacat tagcaaatat tttcctttgc 
atattttcaa atattttaag tcaccttgaa 
tgcttctact cttcatttta aaaattttgc 
aaaatcgtat tttatcctgc agtcctctag 
ctaaaaatgc atatatgtaa tcttctattt 
ccctaacaac agactcacta aaattcgctg 
tatcactgat gaaataaaaa acaaaacaag 
taggaagagc tcgggtctac agctcccagc 
catttccatc tgaggtactg ggttcttctc 
cagtgggtgc gcacaccctg cgcgagcgga 
gcgcaagggg tcagggagtt ccctttccta 
gaaaatcagg tcactcccac cccaatactg 
gccagattat atcccacacc tggcttggag 
ctagcacagc agtctgagat caaactgcaa 
gccattgccc aggcttgctt aggtaaacaa 
ccaccacagc tcaaggaggc ctgcctgcct 
agacaaacaa aaagacagca gtaacctctg 
gaagagagca atggttctcc cagcacacag 
gacccctgac ccctgagcag cctaactggg 
ctcacacggc cgggtactcc aacagacctg 
actaacaaac agaaaggaca tccacaccga 
accaaaagta gataaaacca caaagatagg 
taaaaagcag agcgcctctc ctcctccaaa 
aagctggacg gagaatgact ttgacgagct 
caagctacgg gaggacattc aaaccaaagg 
agaagaatgt ataactagaa taaccaatag 
gaaaaccaag ctcgagaact acatgaagaa 
ctggaagaaa gggtatcagc aatggaagat 
tttagagaaa aaagaataaa aagaaatgag 
aaaagaccaa atctacgtct gattggtgta 
ttggaaaaca ctctgcagga tattatccaa 
aacatccaga ttcaggaaat acagagaacg 
ccaagacaca taattgtcag attcaccaaa 
gccagagaga aaggtcgggt taccctcaaa 
tcggcagaaa ctctacaagc cagaagatag 
aagaattttc aacccagaat ttcatatcca 
ataaaatact ttacagacaa gcaaatgctg 
aaagagctcc tgaaggaagc gctaaacatg 
aatcatacca aaatgtaaag accatcgaga 
aaataaccag ctaacatcat aatgacagga 
aaagtaaatg gactaaacac tccaattaaa 
gtcaagaccc atcagtgtgc tgtattcagg 
ggctcaaaat aaaaggatgg aggaagatct 
gggttgcaat cctagtctct gataaaacag 
aagaaggcca ttacataatg gtaaagggat 
atatatatgc acccaataca ggagcaccca 
aaagagactt agactcccac acattaataa 
acattagaca gatcaacgag acagaaagtc 
ctgcaccaag cgtacctaat agacatctac 
atattttttt cagcaccaca ccacacctac 
cctctactca gcaaatgtaa aagaacagaa 
gcaatcaaac tagaactcag gattaagaat 
ctgaacaacc tgctcctgaa tgactgctgg 
atgttctttg aaaccaacga gaacaaagac 
aaagcagtgt gtagagggaa atttatagca 
tccaaaattg acaccctaac atcacaatta 
tcaaaagcta gcagaaggca agaaataact 
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gttggccctg agtctcacat ttcaatctag 67680 

aataaatgat ctagtgaatt tatatatttt 67740 

tatttcaatc atttattttt cttcaccata 67800 

aggaaggcaa tgaaaaggag acggagatag 67860 

tatcattttg aaaacagact acgggtccca 67920 

aagctttgta tgtctctgag gacaagatag 67980 

atctccttat ttcaagtaaa tgagggactt 68040 

ggttgcttct taaaaaataa ctgtttatta 68100 

ttgtttccac ctttacaaac aaaaatggac 68160 

gacaaattta actttggggt actctgtttc 68220 

ctagtagcag agtttgaact caattaattt 68280 

tgatattttt caaatagtat tttatggaat 68340 

aaagacaggg gaggagccaa gatggccgaa 68400 

gtgagcaaca cagaagacag gtgatttctg 68460 

actagggagt gccagacagt gggcacaggt 68520 

agcagggcga ggcattgcct cactcaggaa 68580 

gtcaaagaaa ggggtgacag atggcacctg 68640 

cacttttctg acgggattaa aaaacggcgt 68700 

ggtcctacac ccacggagtc tcactgattg 68760 

ggtggcagcg aggctagggg aggggcgccc 68820 

agcagccctg aagctggaac tgggtggagc 68880 

ctgtaggctc cacctctggg ggcagggcac 68940 

cagacttaaa tgtccctgtc tgacagcttt 69000 

ctggagatct gcctcctcaa gtgggtccct 69060 

aggcaacccc cagcgggggc agactgacac 69120 

cagctgaggg tcctgtctgt tagaaggaaa 69180 

aaacccatct gtacatcacc atcatcaaag 69240 

gaaaaaacag agcagaaaaa ctggaaaccc 69300 

ggaacgcaat tcctcaccag caatggaaca 69360 

gagagaaggc ttcagacgat caaactactc 69420 

taaagaactt gaaaactttg aaaaaaattt 69480 

agagaagtgc ttaaaggagc tgatggagct 69540 

tgcagaagcc tcaggagccg atgcgatcaa 69600 

gaattgaatg aaatgaagtg agaagggaag 69660 

caaagcctcc aagaaatatg ggactatgtg 69720 

cctgaaagtg atggggagaa tggaaccaag 69780 

gagaacttcc ccaatctagc aaggcaggcc 69840 

ccacaaagac actcctccag aagagcaact 69900 

gttgaaatga aggaaaaaat gttaagggca 69960 

gggaagccca tcagactaac agcggatctc 70020 

tgggggccaa cattcaacat tcttaaagaa 70080 

gccaaactaa gcttcataag tgaaggagaa 70140 

agagattttg tcaccaccag gcctgcccta 70200 

gaaaggaaca accggtacca gccactgaaa 70260 

ctaggaagaa actgcatcaa ctgacgagca 70320 

tcaaattcac acataacaat attaacttta 70380 

agacacagac tggcaaattg gataccaaga 70440 

agacccatct cacgtgcaga gacacacata 70500 

accaagcaaa tggaaaacaa aaaaaggcag 70560 

actttaaaac aacaaagatc aaaagagaca 70620 

caattcaaca agaagagcta actatcctaa 70680 

gattcataaa gcaagtcctg agtgacctac 70740 

tgggagactt taacaaacac cccactgtca 70800 

aacaaggata cccaggaatt gaactcagct 70860 

agaactctcc accccaaatc aacagaatat 70920 

tccaaaactg accacatact tggaagtaaa 70980 

attataacaa actatctctc agaccacagt 71040 

ctaactcaaa accactcaac tatgtagaaa 71100 

gtacataacg aaatgaaggc agaaataaag 71160 

acaacatacc agaatctctg ggacacattc 71220 

ctaaatgcct acaagagaaa gcaggaaaga 71280 

aaagaactag aaaagcaaga gcaaacacat 71340 

aaaatcagag cagaactgaa ggaaatagag 714 00 
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acacaaaaaa cccttcaaaa aattaatgaa 
aaaattgata aaccagtagc aagactaata 
gtgataaaaa atgataaagg ggatatcacc 
agagaatact acaaacacct ctatgcaaat 
ttcctggaca catacactct cccaaactaa 
caataacagg atctgaaatt gtggcaataa 
gaccagatgg attcacagcc gaattctacc 
ttctgaaact attccaacca atagaaaaag 
tcagcatcat cctgatacca aagccgggca 
aatatccctg atgaacattg atgcaaaaat 
gcagcacatc aaaaagcttg tccaccatga 
ctggttcaat atatgcaaat caataaatgt 
aaaccacatg attatctcaa tagatgcaga 
catgctaaaa actctcaata aattaggtat 
tatctatgac aaacccacag ccaatatcat 
tttgaaatct ggcacaagac agggatgtcc 
ggaagttctg gccagggcaa ttaggcagga 
agaggaagtc aaattgtccc tgtttgcaga 
catctcagcc ccaaatctcc ttaagctgat 
aatcaatgta caaaatcaca agtattctta 
atcatgagtg aactcccatt cacaattgct 
cttacaaggg acgtgaagga cctcttcaag 
aaagaggata caaaccaatg gaagaacatt 
gtgaaaatgg ccatactgcc caaggtaatt 
ccaatgactt tcttcacaga attgcaaaaa 
agagcccgca tcgccaagtc aatcctaagc 
cctgacttca atctatacta caaggctaca 
aacagagata tagatgaatg gaacagaaca 
aactatctga tctttgacaa acctgaggaa 
aataaatggt gcagggaaaa ctggctagcc 
cttacacctt atacaaaaat caattcaaga 
accataaaaa ctctagaaga aaacctaggc 
gactttatgt ctaaaacacc aaaagcaatg 
ctagttaaac taaagagctt ctgcacagca 
cctacaaaat gggagaaaat tttcacaacc 
atctacaatg aacttaaaca aatgtacaag 
gaaggacatg aacagacact tctcaaaaga 
aaaatgctca ccatcactgg ccatcagaga 
tctcacacca gttagaatgg caatcattaa 
atgtggagaa ataggaacac ttttacactg 
gtggaagtca gtgtggtgat tcctcaggga 
catcccatta ctgggtatat acccaaagga 
cacaagtatg tttattgtgg cattattcac 
tccaacaatg ttagactgga ttcagaaaat 
agccataaaa atgatgagtt catgtccttt 
attctcagta aactatcgca agaacaaaaa 
gggaattgaa caatgagaac actggacaca 
tgtggggtgg gggagggggg agggatagca 
gttagtgggt gcagcacacc agcatggcac 
gtgcacatgt accctaaaac ttaaagtata 
caagaacaca cacacacaca cacacaaaaa 
tatgctcatc acacacaaat gcaaggtcat 
ctaagttttt catgtacttt gtatcttcta 
ctctacctct ggtaaccact gttttatttt 
ccccatataa gtgagatcat tcagtatttg 
ataatgtcct ccaggcttgt acgtgttgta 
gaataatatt ctattgtaaa tatataccac 
cacttgggtt gtttcaatac cttagctgtg 
agatgcttta agaggtggtg aattcacagg 
atgaggacta ggggtaataa aattataccg 
ttagctgatc ttgcacacag acacacaaaa 
tttgctttac tatagtaaga ttttactcta 
aaatagatgc attaaaattt attttttaaa 
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tccaggagct gtttttttga aaagatcaac 71460 

aagaaaataa gagagaagaa tcaaataggc 71520 

accgatccca cagaaataca aactaccatc 71580 

agactagaaa atctagaaga aatggataaa 71640 

accaggaaga agttgaatct ctgactagac 71700 

tcaatagctt accaaccaaa aagagtccag 71760 

agaggtacaa ggaggagctg gtaccattcc 71820 

agggaatcct ccctaactca ttttatgagg 71880 

gagacacaac caaaaaagga attttagacc 71940 

cctcagtaaa atactggcaa accgaatcca 72000 

tcaagtgggc ttcatccctg ggatgcaagg 72060 

aatccagcat ataaacagaa ccaaagacaa 72120 

aaaggccttt gacaaaattc aacaacactt 72180 

tgatgggacg tatttcaaaa taataagagc 72240 

actgaatggg caaaaactgg aagcattccc 72300 

tctctcacca ctcctattca acatagtgtt 72360 

gaaggaaata aatggtattc aattaggaaa 72420 

cgacatgatt gtatatctag aaaaccccat 72480 

aagcaacttc agcaaagtct caggatacaa 72540 

tacaccaaca acagacaaac agagagccaa 72600 

tcaaagagaa taaaatacct aggaatccaa 72660 

gagaactaca aaccactgct caatgaaata 72720 

ccatgctcat gggtaggaag aatcaatatc 72780 

tacagattca atgccatccc catcaagcta 72840 

actactttaa agttcatata gaaccaaaaa 72900 

caaaagaaca aagctggagg catcatggta 72960 

gtaaccaaaa cagcatggta ctggtagcaa 73020 

gagccctcag aaatatcgcc gcatatctac 73080 

aacaagcaat ggggaaagga ttccctattt 73140 

atatgtagaa agctgaaact ggatcccttc 73200 

tggattaaag acttaaacgt tagacctaaa 73260 

tttacctttc aggacatagg catgggcaag 73320 

gcaacaaaag ccaaaattga caaatgggat 73380 

aaagaaacta ccatcagagt gaacaggcaa 73440 

tactcatctg acaaagggct aatatccaga 73500 

aaaatcaaac aaccccatca aaaagtgggc 73560 

agacatttat gcagccaaaa aacacatgaa 73620 

aatgcaaatc aaaaccacaa tgagatacca 73680 

aaagtcagga aaaaacaggt gctggagagg 73740 

ttggtgggac tgtaaactag ttcaaccatt 73800 

tctagaacta gaaataccat ttgacccagc 73860 

ctataaatca tgctgctata aagacacatg 73920 

aatagcaaag acttggaacc aacccaaatg 73980 

gtggcacata tacaccatgg aatactatgc 74040 

gtagggacat ggatgaaatt ggaaatcatc 74100 

accaaacact gcataccctc actcataggt 74160 

ggaaggggaa catcacactc tggggagtgt 74220 

ttgggagata aacctaatgc tagatgacga 74280 

atgtatacat atgttacaaa cctgcatgtt 74340 

ataataataa agaaatgggt cttctctacc 74400 

aaaaaaaaaa aaaacaagaa agacaaacag 74460 

ttttaggcat tagatggggc ttaatttaat 74520 

acctgcatac ctctatttcc tcttctagcc 74580 

tatcactgta tatttaattt tttaaatatt 74640 

tctttctgtg tctggtttat ttcacttaga 74700 

tcaaataaca cgatcttcct tttcagggat 74760 

catttcttta tctatttgtc cttcatgagg 74820 

atgaataata ctgcaaaaac atggaagtac 74880 

ataaacaagt ctagagatct aatgaacaac 7494 0 

tatttgggat tcgtgctaaa tgaatatatc 75000 

agggtaacta tatgttaatg tttgtgttaa 75060 

tgtatcacat gacattatgt tgtgaacctt 75120 

agggttcatt aactgcggtt caggcaatct 75180 
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ataaaccttt tacattttgg aggtgagatt 
aggggttcat ttgaacaagc aaacaaacga 
gaattcttcc taagtagttt tcttgactgg 
ttattcactt gtgattgaat agcctacaaa 
gattacggcc tagtgcattc tcacctggaa 
ttttaattgt tgagagttgg ggactttcct 
atggcttgtg tgttccaggt atggttgggt 
gtgcttagaa gaccatccca agcttggttt 
acataatttt tttgtacaag gattcttgca 
ccttatcctc ctcgcatggc aagtgacagg 
ctcaaatccc taggggtgga atgcagacag 
ggctccaacc ccacagcagc gtctagggtt 
gtgtgtgtta cagtgtgatc tttcagcttt 
tcagttagac cctctgcatt accacaaggg 
ccagtgtatc ggaaaaattg gatcacatgt 
gagtggtgga ggtggatctc aatgagatgg 
aattggtctt cccctggagt cgggcagctc 
acgaactccc ctcggtgtct gcatcgttcc 
gtctgctgtt gtgctcctct gctcctgtca 
aggcctcggg tttttttggg cacaggatgg 
aatgcaacat ttgggtgcaa aaacaggaga 
gtctgtgggt ggagccctca ccagggaccc 
tgctcctgta tcaatattgg attctgtata 
tacaacttgt aactttttcc tatccttgac 
gcagaacctt cccaacatac tgacctagca 
caccttttat tctgtcctga cacaggggtt 
cacatcttag ctgtagtctt tttacctctc 
attgtgtatt tctggatccc ataaaagcca 
ctcacaggca gttattaagc cacatgcaac 
agtccaattt cctactgtgt ctttttmaat 
ctcatttttc tttttgtaac cactgaaact 
ataggaactt gaaatgagta acacttaacg 
gtgaaatatc tgtagtttag ggagcttagc 
gtgtgtgtgt gtttgtgtgt gcacatacac 
agaaccatta catctggtat ccttgtattt 
ctgagacagg tctcagttag tttagaaagt 
gacacagctt ccagaggtcc tgaggacatg 
ttatacattt tagagagaca taatctatca 
gtgcagaaca actcaaagtc caggttgggg 
taaatttaaa cattttctgg ttgacaattg 
agtctgtgtt atgttaatgc tggagaggaa 
taaccgcctg aactagtctt tcaagttaaa 
ctttatttat ttttggttta cacttcttat 
gttgttgcca taaactttac tttttttttt 
ctggagtaca gtggctgatc tccgctcact 
cttctgcctc agcctccgaa gtagctgaga 
tttttgtatt tttagtagag atggggtttc 
tgacctcagg tgacccacct gcctcagcct 
atcctgcctg gccaaacatt acttttttaa 
aactgtattt cctttatttt tctccacaac 
ataatagctc tctctgtcca ctagaatggg 
gtttgttcgc ttctgcatct atagtgccca 
cttgtttcat gaatatctat tttgatgccc 
ttattatttc tatttacaca tgggaaaaca 
attactgaag taactctgta atccagggct 
atagcttcta cttgatattt cttagatata 
gtcacatagt agaacatgac taagtgtaag 
atggtgcact atgtaagaat gccatgacta 
atatctcaag gccttccagg tctatacata 
ggtgatatgt attgcagcag cagctaataa 
tcttacctca gattctcaga ttgtatgcac 
cctttttgta cgttgctttt ctcaatagta 
tctaaaactt acagtaaaat aaatgaggtg 
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ttcagtatct gcatttatct tggattttca 75240 

agaaacaagc ttaagaatct ttcttacaat 75300 

tttcttttct ttttctgaaa cagtcttttc 75360 

attgatctag acaattaggc tcttgaaatt 75420 

taaaagcctt cctttggagc atcttctatt 75480 

tctcattctg tattttcagt gtcagttttt 75540 

catggtcatg cagtctattc acaaggcctt 75600 

aatgctttga tctcatggtc ctgaaatttt 75660 

gttttatttc gtattgaaat aggagagttt 75720 

agtgtggctt gcttctttgg tgccccactg 75780 

tcaggtgtac aggtcgtggg agtgtttttg 75840 

aagtgtttac agctcctgaa gccccaatgg 75900 

gctgtcttca ggcggcttgc attaatcagc 75960 

cagcaggctt tctgtattcc aagttcttgc 76020 

ggacatggag gatgagtgca aggttttatt 76080 

atggggagcc agaaagggga cggagtggga 76140 

atttgccgat gatgctctga ccacccccag 76200 

accatcactg gtctgctggc atctgctggt 76260 

acatccagcc acttgtgtcc atgcccacta 76320 

ggcctgtggt gggccagagt ggtcttggaa 76380 

gcctgttctc acttaggtcc ttgggcaaag 76440 

cacccttctc tacacatcac ttcccaaccc 76500 

ctatgtatct gaggtcccag ttctaggttt 76560 

ctactcctct gctcgttctt cctctatcat 76620 

agcccaggtg ttccttattc aagaggtgtt 76680 

agacctactg tggttttcct tctttgtaca 76740 

tccacttatg agcccttctt ctctactgca 76800 

atgagcctct cagtcaaaac tgtttcagat 76860 

aaataagtta ttctgcaaat gaactcctct 76920 

ctgaaattcc cacatcctga tatattttct 76980 

gtactctggt gaaggccact gatactcttg 77040 

tgagaaagat ttgaagtgaa taagagtaaa 77100 

tggattttct gcactttgca ttaagtgtgt 77160 

acacaaattt cactcagtga tagtttccaa 77220 

gctcgttatc cttctgtaga cccaggaaat 77280 

ttattttgcc aaggttgagg acacacctgt 77340 

tgttgaaggt ggtcatggta cagcttgctt 774 00 

atcaatacat gtaagattta ccttggtttg 774 60 

cagggtggcg tgtcgtttcc aggctgtagg 77520 

gtttgtctaa agacctgaga tcaatagaaa 77580 

taatgaggca tgtccgaccc ccatttctct 77640 

ttttacaagc cctggctgag gaggaagtcc 77700 

gatatccttc ttagaagttt tcttgctaca 77760 

tttttggtag aaccttattc tgtcgccagg 77820 

gcaacctctc catcctgggc ttgagcaatt 77880 

ttacaggcac gtgccaccac acctgactga 77940 

tccgtgttgg ccaggctggt cttgaactcc 78000 

cccaaagtgc tcgaactaca ggcatgagcc 78060 

tgaggtcttc cttgactttc ctttccccaa 78120 

agttatttga atattattta tttacttgaa 78180 

aaaaccaaaa aaagcagaaa tttttgaaag 78240 

atcacacaaa atattgagta tcagtaaata 78300 

aaaatgacat gagaagagga aagataaata 78360 

cagtgattca ttcagttgca aagcttttaa 78420 

cctaaccagc tcctaacatc aagatatgtc 78480 

tcacatacag gtactcaagg gcacaagtca 78540 

gagaatgtgt ctggcacaat cctgtttatc 78600 

tctgtgtccc cagaagttcc ttatcatcat 78660 

caattttatg aaacagttta gctttagaaa 78720 

aggaagtttt taaaatacat aattttcagt 78780 

tagtagtgtg acctcggttc aattatttaa 78840 

acagcaagac agagttttga ttataggcat 78900 

actcattttg agatgagaca agaattgaga 78960 
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agcctgtaat aaaaatttca agtatgaaag aatgggcaaa aattaaaaac ttctcgaaag 79020 

aaaggaaaat gtgaattagc aaagattctg atgtgaagag aggtaaaaag ccaaaaggat 79080 

ttgtaatacc tcttgttaga tacaagttgt gattaaaaga ttaaataaca actctgaaaa 79140 

tattaattta ttttctttct aaacacatta agtcattgct ctaaacatcc tcaagatact 79200 

atatgaaaat ttattttatt tcacagttct ttgttcattt atttggcata tgtttattga 79260 

gccattctga ctgtgcattt tgagctatga tcttggaaca cttgcattag aatcctacag 79320 

ataatcatta gagatgaaga ttccagtacc ccacataaga atttctgcaa aaacctgtga 79380 

tatgggctaa taaacaagct tcccaagaat tgcttctaca ctaagtttca ataatcactt 79440 

aattctcctg agaacatgac atcacgcaag gctacatgga gataaatctg tgagtaaaaa 79500 

ttgcaagtca ttcattctta catgctctcc tttttgcatt ttaagaaaat ttagtataaa 79560 

aatatgtgtt gagaagtata attaaaaaat gaacttcact ttgtccctgc tctcaacacc 79620 

attatgtgat gagaaagtat atactttata atgatacaat gggggctaaa taataaatta 79680 

gagaaatata gaatactctg gaaataaaga agagagaatt cagtgacacg tattaggaga 7974 0 

aatggatgag aaatgcttca caaagaagtg gcctttaaag tgggtcatgt agcatgaata 79800 

gaattaggtt ttaaaaagaa caggaagata gaagagatgt agctcagtca tatagataga 79860 

tagatagata gatagataga tagatagata gatagataga tagatgatgg atagatagat 79920 

acaggtgtag gtatagatat atctataggt gtagagaaag aatatattag taatgtaatg 79980 

cctaggcaaa tgatataatg acattgtaga cttacaaaat tatagtaagt ctaccataat 8004 0 

taccaaaacc aaagtttatg gtagtaccca gaaatagata actttcgatg atatatgaag 80100 

atgtggagaa aaactaattc aaatgaacat taacatggta tgtgcatttg tttatattgg 80160 

ctttatttcc atagcttgaa gagcaaactg tcaggaaatg tccaacacaa atggcagtgc 80220 

aatcacagaa ttcattttac ttgggctcac agattgcccg gaactccagt ctctgctttt 80280 

tgtgctgttt ctggttgttt acctcgtcac cctgctaggc aacctgggca tgataatgtt 80340 

aatgagactg gactctcgcc ttcacacgcc catgtacttc ttcctcacta acttagcctt 804 00 

tgtggatttg tgctatacat caaatgcaac cccgcagatg tcgactaata tcgtatctga 804 60 

gaagaccatt tcctttgctg gttgctttac acagtgctac attttcattg cccttctact 80520 

cactgagttt tacatgctgg cagcaatggc ctatgaccgc tatgtggcca tatatgaccc 80580 

tctgcgctac agtgtgaaaa cgtccaggag agtttgcatc tgcttggcca catttcccta 80640 

tgtctatggc ttctcagatg gactcttcca ggccatcctg accttccgcc tgaccttctg 80700 

tagatccaat gtcatcaacc acttctactg tgctgacccg ccgctcatta agctttcttg 80760 

ttctgatact tatgtcaaag agcatgccat gttcatatct gctggcttca acctctccag 80820 

ctccctcacc atcgtcttgg tgtcctatgc cttcattctt gctgccatcc tccggatcaa 80880 

atcagcagag ggaaggcaca aggcattctc cacctgtggt tcccatatga tggctgtcac 8094 0 

cctgttttat gggactctct tttgcatgta tataagacca ccaacagata agactgttga 81000 

ggaatctaaa ataatagctg tcttttacac ctttgtgagt ccggtactta atccattgat 81060 

ctacagtctg aggaataaag atgtgaagca ggccttgaag aatgtcctga gatgaaatat 81120 

tgtcatgacc atggtgatgc ctttgtttcc taataaacat taaatcgaaa tctttggctc 81180 

acatgtccta gcgttctgat ggtgagtttt aatattctct gtgagtctat gttgagtgtc 81240 

tcagctaaaa agctcatgct gggtaaaaat gagatttttc taggctttgc tcctccacat 81300 

atattccatg aatcagcagc atgagctctt ccttggaggt tgttacacgt acagaatcaa 81360 

agtctgcacc tcaggtgcac tgtatttaaa tatgtgtttt atccaaactc ctagatgatt 81420 

gataagcaca ctgaattttg aggagcactg ctgtgggtga aacgtggcat gccctggaac 81480 

actgttgtgc tctttttgtt tacaacggca aacaaaataa atgtgctccc agcccaattt 81540 

cttgaatgta ttctattctt attctcgcct gctgcttcag cagagatgtc tttaagaaac 81600 

ccattcttct gcactccaag aaatccattc ttctgtactc ctttcctgac ttgctgtggt 81660 

agaccagaaa ctaggggcca tggtgatggc ttctggtttt tataagtgct cttacatagt 81720 

gaagagtggc aatggaaaaa gaggggaaga gaatgattgt acttttctta aacttgagtt 81780 

tatggatcca gctctctgag ttacataaac ccaccttccc attctgagcc tccagtgtat 81840 

ggaggtcact ttctgtttcg cctgattttg aaaaatttaa gcaggctgat ctgagaataa 81900 

ggactaaaat taaaatggga aatgaaagaa ccgagtaaac aataggtcat acgatagagt 81960 

atccaccttg ctttgaatat tccttctatt tgtaattaag tttaggtaag taatgttaat 82020 

tttttgtttt ccttagcact ggcctgttta tatgtgtcca aggaagtaag gatttcttcc 82080 

acaggaagag aagacattgc tctggcatct tcaagaaact ttagaaagta gaaagaattc 82140 

ctctttagac attatgctaa aaagaattat ttaaggaaca agtaaaaaat atattttttt 82200 

aagatagagt ttcactctgt tgcccaggtt ggagaacagt ggcatgatca tattaataac 82260 

ttcctccaac tcctaggctc aagtgaccct cccgcctgag cctccctagt agctgggacc 82320 

acaggtgtgt gctaccacac tcggctaatt tttaaatttt tttgtagatc tgatatctgt 82380 

ctatattgcc caggcaggtc ttaaactcct gggctcaagt ggctctcctg cctcagcctc 82440 

ccaaagtgct gggattatag gcataaatta ctgtgcctgg tcaagattct tctactctcc 82500 

aaactttcta cttattgcct caaaatacta aatattttct aaatatcttt attggaaggc 82560 

aaattctgcc attacgccat tagaaaataa aaagataaaa tattttgtgt atatattaag 82620 

ttgtacatta agtaactgta tttataaggc ttctgcattt tgttcctgtt gttgtttggg 82680 

tttgttgtta gttttacaag gaaaaaagag ccccaagcta tctaagtcaa caacggcaat 82740 
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ttatgacaaa tatatggtta tatttcacag cttctaaaga caagagtaca gctgaatctc 82800 

ggtaatcaac caaaatgctg aactctaagg ccaagagaaa ccgtatgtct cattgtatat 82860 

ttcatatcca tttgttacta ggatacataa taattgagct taattatgac ccttgcatcc 82920 

agcaaccctg ctcatttgtc ttaattagag tagaattttt gtttgtttct tctataaatt 82980 

ctctaaaatt tacttcattt catctgcaaa ttaacagatt ttattccttt ccaatattta 83040 

tgagttttat ttatttttcc ttcatcattg cagacttaga ctgatattat aatgtcaaat 83100 

agaggtagtg aaaacaaata tcaaggcctt gttcctcatt tcatttgtgt tttgtttctt 83160 

tgtttgtttt gagacagagc cttgctcttg tcgcccaggt tggagtgcaa tggcacaatc 83220 

ttggctcact gcaacctctg cctcccgggt tcaagtgatt ctccttcttc agcctcctgg 83280 

gtagctggga ttacaggcac ccaccaccat gcctgactaa tttttgtatt tttaatagag 83340 

atgggttttc gccatgttgg tcaggctggt ctcgaactcc tgacctcatg atccacccac 83400 

ctcagcctcc caaagttctg ggattatagg cgtgagccac catacccagc cttgctcctt 83460 

attttagggg gaacatattt aatgtatcat caggttgtat atcattagtt tgattttttt 83520 

tgtggcatta ttaagattcc atttaaaaat ttgacatatt tgttcacgca tctgttgtta 83580 

gaagtgggat tgttttatta agtatcaaat tatcttaagt ttcaggctta atgacagtgt 83640 

aatttcaggg gcataaggaa gattttccca ctcatccaac atagacttat actgagagag 83700 

atgtaattac aattaacaac catgtatttc taatttcaca tgtgcgggat gctttgttct 83760 

ctttttctca tctgatcaag gcaccatggc tccaacatcc tcatctatgg gatatcttgc 83820 

tgacttttgt aggtagaagt aagcattttc tgcattgtgt ccataatcta ctttttaatt 83880 

agaaatttta cttaatttca cactatattc atttctgtgg tcaaaaaaat caaataaaat 83940 

tgtgaattgg ggtggtggct gaaggatagt aataatatgt ttaggcaatt tgggtaaatc 84000 

atggagtatt ttttctttaa tgaagaaata tgctgttctg cttcatccat ttattgcaat 84060 

tccaaaaagc aaactaacta tagtcagatt atttggattt tatgagaatt tcataatatt 84120 

ataattattg caaaatttat taattttaaa aaatcacgtg aaccaaaact aaacacttgt 84180 

ggatataatc aagattcttg ttttcaatat atttactcaa aaaacattta tggataactt 84240 

agcatgttcc aggcagtgta ctgagtgttg tgtatacagt gaatataaaa tacagtttct 84300 

tcttctcttg gataacaggc tggaggggaa gatatatctt aaacaaacaa aataaactgc 84360 

acttaaaatt aagattgtga caagtgtcat aaataagtag aataaggcaa caggagagag 84420 

aacaattatg atggtgcaca gattttactg taggtagaag atgtcttctc ttagggttga 84480 

tatttaagct ggcatcctag agatgattag ggaataacta gaagaagttt gggagaaata 84540 

gaattcaaag cagtaattag aactaagcac cacatcgaaa tgatagaaag atctcaacaa 84600 

cttaacataa caactaaata atataaggag gctgctagct agactaatat gaagaaaaga 84660 

gagatggccc aaaaaacaaa attataaatg accaaggaga tgttaccact gaacccctag 84720 

aaatacaaat aaccatcaga gactattatg agcacctcta tgcacacaaa cgacaaaatc 84780 

tagaagggat ggataaattc ctggacacat acccttcaca agactaagcc aggaagaaat 84840 

tgatttcctg aacagaccaa taatgagctc caacattgaa tcagtagtaa gtagcctacc 84900 

aaccaaaaga aatcccagta ccagatggat tctctgccaa attctaccag atgtacaaag 84960 

aagagatagt accatttgta ttgaacctat gcaaaaaatt taggaagaat gattcttccc 85020 

cagtcgttcc ttaaggccag catcattctg atatcaaaac ctggcaaaaa cacaacaaaa 85080 

aatgaaaatt ttaggccaat atccttgatg aatattaatg caaaaatctt caacaaaata 85140 

gttgcaaact taatccatga gaacatcaaa agcctaatct atcacaatca agtaggcttt 85200 

atccctggga tgtgaggttg gttcaacata cataaagtta aatgtgattc atcacacaga 85260 

cagaactaga ggcaaaaact atgtgttatt tcaacagata cagaaaaggc ttttattaaa 85320 

attcaacatc ccttcatgtt aaaaactctc agtgagctag gtattgaagg aacatacctc 85380 

aaaataataa gagccatctg tgacaaaccc acagccaaca tcacacagaa tggggaaaag 85440 

ttggaagcat tccccttaaa aacctgcaca aggcaaggat gctgtctctc accactccta 85500 

tttaacatgg tattggaagt gttagccata acaattaggc aagagaaaga aataaagggc 85560 

atccaaatag gaagagagga attcaagcta tgcctgtttg cagtcaatat aattttatat 85620 

ctagaaaacc caaaaacttc ttcagctgat aaacaacttc agcaaagtct cagggtacag 85680 

aatcaatgta caaaaatcac ttgctttcct atacaccaat aactgtcagg caaacagtca 85740 

aatcaggaat gcaatccaat tcacaattgt tagaagaaaa ataaaatact taggaataca 85800 

actaactaga aagatgaaag agcactacaa tgagaattac aaaatctgct caaaaaaatc 85860 

agagatgaca caaacaaaag ggaaaacatt ctgtgatggt taataatgag tgagtgtcaa 85920 

cttgattgga ttgaaggatg caaagtattg atccttgatg tgtctgtgag ggtgttgcct 85980 

aaggagatta acatttgagt cagtgaggtg ggaaaggcag acataccctt aatctggatg 86040 

ggcacaatct aatcagctga cagtgtgccc agaatataaa gtaggcagaa aaacatgaaa 86100 

aggttagact ggcttagcct cccaatctac atctttctcc cgtgctggat gcttcctgcc 86160 

ctcaaacatt gaactccatg ttcttcagct ttgagacttg gactggcttc cttgcccctc 86220 

agcttgcaga tggcctattg taggaccttg ccttgtgatc atgtgagtta ataatactta 86280 

acaaactccc atatatatat atatacacac acacacacat atatatacat atatatacat 86340 

atatatatat gtatatccta ttagttctgt ccctctagag aaccctaata cacattacat 86400 

gctcatagat aaggaaaatt aatattatta caatggccat actacccaaa gcaatttaca 86460 

gattcaatgc tattcctgtc aaactaccaa tgacattctt catagaatca gaaaaaaagc 86520 
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tatttaaaaa tttataggga accttgaacc caaatagcca aagcaatcct aagcaaaaag 86580 

aacaaacctg gagcatcacg ttacctgact tcaaactata ctataaggct acaggaagca 86640 

aaacagcatg gtactggtgc aaaaacagac aaataaaaca atgaaacaga attgaaaggc 86700 

cataaataag accacacacc tacaaccatc tgatttttga caaagctgac aaaagaacag 86760 

ggggaaagga caccctattc aataaatggt gctaggctga ctggctagcc atatacagaa 86820 

gattgaaacc ggacctgttt cttacaccat acacaaaaat caactaaaga tgaattaaag 86880 

acttaaatat aaaaccctga actgtcaaaa ccctgaaaga caacataggc aacaccattc 86940 

tggacatagg aactggcaga gatttcataa tgaagacacc aaaaacaatt gcaataaaag 87000 

caaaaattga caaatttttg gatataaata aacttaaaag cttctgtgaa agaaactttc 87060 

aacagagtaa acagacaacc tacagaatgg aataaaatat ttgcaaacta tccatctgac 87120 

aaaggtaata tccagcatct ataaggaact taaacgaatt tacaagaaaa aaatgatccc 87180 

attaaaaagt gagaaaagga catgaataga tacttttcaa aagaagacgt aaatgggagc 87240 

taaataatga aaacacatgg acacaaaggg gagaacaaca gacactgggg cctacttgaa 87300 

gatggaagat gggagtaggg aaaggatcag aaaaaataac tgtttggtac taggcttagt 87360 

atgtgggtga tgaaataatc tgtacaacaa acccccatga cgtgagttta actgtatcaa 87420 

aaaccttcac atatacccct gaacctaaaa aaaaaaaaaa aaaaaaaagg ttaaaaaacg 87480 

gctctgtgaa ggttctgagt ttgaaaaggg cttgacccaa gaccagtatg gctataccag 87540 

ggaatgaggg agacagtgac agataatgct gatttaggtg tagactatag taaaagtttt 87600 

tttttttttt tttttttttt ttttttgagg cggagtttcg ctctgtcgcc caggctggag 87660 

tgcagtggcg cgatctcgac tcactgcaag ctccgcctcc cgggtttacg ccattctcct 8772 0 

gcctcagcct cccgtgtagc tgggactaca ggcgcgcgcc accatgcccg gctaattttt 87780 

gtatttttag tagagacggg gtttcaccgt gttagccagg atggtctcga tctcctgacc 8784 0 

tcgtgatccg cccgcctcgg cctcccaaag tgctgggatt acaggcgtga gccaccgcgc 87900 

ctggcctata gtaaaagttt ttatttcggt atatcattgg aaaaatgtgc ttccaatgaa 87960 

gcattcatgt atctcttttg ccttcataat tagggcaaat acatgtacta tttgatcaca 88020 

tgtacaggtt acatcttata ttcatatgtc ctagctaagt gaggattgga tacttggtaa 88080 

aagggaaaaa aaaggtaggc aaaatatttt gatctttatt ctgtttgttt tcataggttg 88140 

gcaaataaat aacattttgc taagttacac attcaccagg ttttccataa aatattactt 88200 

cccaacacct ggattaaata ccgtgaacta actcaactac aacaattttg aggtccaaat 88260 

tattttaccc agtagatttt acaatttaac ttctttattc taaccatttt gtgaaaggta 88320 

gtgtccaaca cacaattcag gctgaaagag aatcacagaa acatcaacgc ttatgaatta 88380 

tgtcattatt tttttacctt tattaatcat tccattcatc tgattggaat aatgaaccat 8844 0 

aatgcaaaac cccatgtgaa taaactcctg atccccaaag caatcatctc gtagtgatta 88500 

caaattgttg caatcaagtt gcttaagtcc caagtttcta ttgagcacat gagtctgtac 88560 

caattactac attttccctg gagttgtgtt ttttagtaac attgcaattg tttcttcaga 88620 

gtgtattaaa aaatacttta gtgataagag ctccaaatta caagatgggt ataacttatt 88680 

cttgttaact cattttttgg agcactagtg tctgtgggag tacctttctt ggcatcaggt 8874 0 

aggaataaaa attctgactt ctgcacttgt actttctctt cttgtctcaa ttttggattt 88800 

ttatcagtag gagtcaagtg acaataaaag tagcttcaaa tgttagttta gattctcagc 88860 

ctattaaaat gagtttgtat acatttctta tctagacggg tgatgagaaa gtattacact 88920 

ttaattgaaa ttattcacaa caactgtttc actttacaaa tatcattctc accattcatt 88980 

caaatattga acccacttga ataactattg gagtaaacat taatatgaca ataaattcaa 8904 0 

gtgttgaaca caccttgaat aaccactgga gtaaatatta ataagacaat caattcaagt 89100 

gtttactaga ataatatctg ttgaaaagac cccttataaa ggaaacgctt tatcaaaagt 89160 

tcaaaagaga tcttattttt ttatggctga atagtatttc attgtatcat agatggacac 89220 

aatgaaatat catagacatt ttacattaag taaaataagc caggaaaaga atattaaata 89280 

ctctatgttc tcactcgtac gtggaaacta aaaaatgttt atgtcagaga agtaaaaagt 89340 

agaaaagagg atactagagg cttcctagag gctaggaagg gtagagagaa ggaaggatag 89400 

gcagaaattt gttaaaggat acaaattata gcaaggtaag agggataagt tctactgttc 89460 

tatggtcctg taggattact gtagttcata atatatagtt tcaaatagct agaaaaagga 89520 

tagtgaatgt tcccataata aagaaatgat aaatatttga cataagggat atgctaatta 89580 

cccttatctg atcactatac atcatatgca tcaaaacatc actgtgtacc cactaaagat 89640 

gaacaagtat tatatgtcat ttaaaaataa agtaaaataa aggggaaaaa tttgaaagag 89700 

ggacagatca attcattaga aattgtactt tttctagaaa acataaaaca cttatagaac 89760 

tttacatatc tttctttaaa ttatcagaga aaatttggag caaatttaat attaactaag 89820 

aagtacatga ctagtgatga gtgtagaata gaaaaatcta tgaatgcaaa atcagaatgt 89880 

tagaaatgat gggcacttta gaaactattt tgtcaaagtc tgacattata tagattagaa 89940 

ttctaaagct cataggaaaa aatcaccata tcgaggtaaa gagactgctt tctaaatgaa 90000 

gacttaaatg agtttttatt gacatatcat aggcaagaat aaagccacaa catataacat 90060 

gccctagaac ttaggttcat gcactgtgct tagataatgt gccctgttat tttatttgtg 90120 

agacactaga atggaaaaca aagagaaagg gatggtgctg aagtccattg tgactgtgta 90180 

gacaaggaag gtgctcctgg ctgagaaact gatattcaca ggagttgctt ggctccacca 90240 

actactagac ttatctacct ctgttttaat tgaagatcgt aaaatatcat tatttgccca 90300 
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gcagttcttg cagacttatc tatgtcttat 
aactgaaaat aagcttcagt aatctttacc 
aatttctcct agtctcccac acaattcatt 
atacagaatg tttagtaact atgaaagcag 
tatttgtatt ctattttatt tcattttatt 
ggctaaagtg cagtggtgca atcatggcat 
ctccccagta gctgagacca caggcatgca 
aatatttgtt tttgtagaga tggagtctcc 
aggctcaagc ctcagcctac caaagtgcta 
cttgcatttt aactggagcc tgaagaagcc 
gtaactcaga tatcagtaac atttgtagaa 
aatttttttt ttagttggga ggatgggaaa 
ttaccatgtt taccaattta ccataaatga 
caaacagsct actaaatgac aaataatgtt 
aaacatatgg atatagatat cctgtgaatc 
actttgttat ctagtttatc tttagtaatt 
tgaagcagtg aaacatttgg agttttattt 
actttaaaaa actcactgat gtctacactg 
acaccaatgt catggaaaaa cccctttgca 
gcttaattat ttttctgtgt gttatttctc 
caaaactgat tcagcactct ttctaatgct 
gaatacaagt tagaaaataa gagtcatgaa 
ttaaataagt gtgcactttt tctaagcaca 
agttatcaaa ggcaattcct acaggcatta 
agtttaacca ttggcaaata aataacattt 
taaaatatta cttcccagca cctggattaa 
ttgaggtcca aattatttta cccagtagaa 
gatgacgtgt tattaaacat ttcacaatca 
ctttagctct gatgcagaaa atttgaatat 
ttaccaatag tatttgtcat atgcatgctt 
caacatatat atacacatac atgaatatat 
ctctagaagt gaaacttgaa tgttttaata 
attataaatt aaaaacattc cagatctcaa 
gacatttccc aagtacttta gcctttaaaa 
gttcttggga agagggctgt ggataacaag 
gtgtgatcaa agaacaggtt aacatgtaaa 
gttatatgag agtattttaa atatttggac 
tctatctata tctatctatc tatctatcta 
atctatctgt ctgtctatta gtttgactta 
aaccatgatt atttgttaaa tatgatatga 
cacaatacag attgtttgag agtagaatgg 
gctctgatat cacattctgt tttattttcc 
tattctctcc ccatttatgg agtgcctctt 
tttcctataa tcaatgatag ttctctacta 
aattgtggag attatgtctt acttccttga 
acagcattcc ctgctgctca gtgtgaaatt 
agtgctcttg ggtctttgaa taaggaaggt 
ctggagcttt actttacaca tgtccatata 
acctatttta cctttcccca cttaaaccca 
ctattttgtt gcttaatatt cagtggttct 
aaaactcaga cgatagcaag attaacatgt 
acttgctcta catcagttga agggcttttc 
tttctttgta gacctaaagt gcctatgtca 
gttttctctt taagctgtag agggtaggtt 
atagaaaatt aaattatagt atctttttta 
attatttgtt aagtataata tgatgattaa 
aaatgtaaag attagataaa tatatttatg 
ggtgaatatt ttccaaaatg atgcatcctt 
tttacataaa tctttatttt ctaattattg 
tttaagacaa gaaatcattt caatttccaa 
ccatcacttg ggtgccatac acacgcacag 
acaaattcaa agatgtgctt ccttgtagat 
gtgttggaga ctttaagttt tgtttgacta 
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tgctcagttg aggtcgttcc tgggctcaat 90360 

tctgttttct atggtctttt ctttggcttc 90420 

ttataggatg acataatggt gacctgatgt 90480 

ttccgtgcta tcttattttg ctatctcaaa 90540 

ttatgagaca ggatctcact gtgtcaccca 90600 

actgggccca agggagcctc tcacctcaac 90660 

ccaccacacc tggagatggt tttttttttt 90720 

ctatgttgcc cagttttgtc ttgaactctt 90780 

ggagtaaagg catgagccac attgcctgac 90840 

aaacttattc ccaaatgaac atttccacaa 90900 

atttactttg cacttttatt gaatcaatgt 90960 

atcttggaac tccgaatttc acccaaatat 91020 

cacctctagt tttctgttca acaatatatt 91080 

acttgaaaaa tatcttcatg taataaarat 91140 

aaaatttacc aacatgraaa atatgtgttg 91200 

ccaatacagt gcgaagcaag ttaacattat 91260 

tggttatatt tcttgcaata agaaacatag 91320 

attcctaatt acccagaaat ttcagactga 91380 

gaaggacatc tggtttaatc catcattgag 91440 

acacagagaa caatgtatgt atagctatta 91500 

ttaatgtgaa ttacttggca atctatattg 91560 

agataaattt tgctactttt tacacaaata 91620 

gggcacactg cgcatggtaa acattaaata 91680 

aacatcacct ttgatacatt tcagcaaccc 91740 

tgctaagtta cacattcacc aggctttcca 91800 

ataccatgaa ctaactctac tacaagaatt 91860 

gtttaaataa agaactagaa atgatcttca 91920 

ttcagctatt tattttcagt agactaaagg 91980 

atcatgggag ttatcaaaat tattttaaat 92040 

ttagtaatct cagttgagca tatgcccccc 92100 

atagaggtgc ttgtatgtta cactgaggcc 92160 

tgactaagtt cataatctcc tgcccaggaa 92220 

agagcttttc tttactgaaa tattgagaaa 92280 

tatgattcat gcaagcaact aaattagaga 92340 

agaaaagtga gaagagaaga aactaaggat 924 00 

aaattcaaaa gaggaaaaaa atgcttgcaa 92460 

gttgtctcat ggattacatc tatttatcta 92520 

tctgtctatc tatctctcca tctatctatc 92580 

tgccttacaa ccaataaaca cttaataagc 92640 

tgaataaaca aatgctagta tttaggattt 92700 

attcacgtcc acctgcaatt gatacgcaga 92760 

tcagagttta acattttttt ctcctgtttc 92820 

cagtgctcag tatttacccc aagcattttt 92880 

gtggcacggc tgaatgaagc tggtttccca 92940 

attttgttta gtttttgagg ttcagggatg 93000 

agggatttta gcctaaagag aaaatttgat 93060 

tgggacatgc tgaaagcctt tgcccacttt 93120 

gaaggatctg tgatgttctg caatagagca 93180 

gtatttccaa actttttttt ccccataggc 93240 

tctgaacaaa gttttgaaaa ttctaaccta 93300 

ttacaacccc ttttccttac tcctaatggt 93360 

taaaagtgaa atgaaacaga agcggtgtga 93420 

atgggaggag atactattat ccaaatatga 93480 

gctgcatata tttaaaggaa agagcaggac 93540 

tccttttttc tttatttttg tcattatttt 93600 

acagatgata atatacaaga tatattgcat 93660 

ccattaaaac ataagattta tatttactaa 93720 

ctggagagtg aggaaagggc acttttactc 93780 

cattttcctc tcaaaggcca aagttgtaga 93840 

gtaacaggga acgtcagttc taattcatca 93900 

gcacacacac tcaaaaactg cgtgtcaaag 93960 

ttcaggaggc atgttcactc ttcttggtat 94020 

cttaccgctt attaaatgtc ttatttgcca 94080 
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caatagaaaa gtcatccctg catctatgtg 
taagacaatt ttctgcatgt actgttcata 
tctaggaaat tcctagttat ttttcaagcc 
cctgattcct gaaacaatta gttattttgt 
taactttaat tttcatgctc cttaccatga 
gtgatcagtt gtcttcaatt ccccagcaaa 
gtttgtgcac aaatgtaacc atattacaat 
acagagtcaa agtagctaac ttgtgtgaac 
cactgaagta tgtagttgaa gacactggtg 
cctacagact tctgttttgt gccctggatt 
aacactatct ttggtagatt tcccataaaa 
ctcttaagac cttacaacac aaggtccctg 
gtgacttctg agtctacact gggaaccact 
caacccactt cactcattct tattagctaa 
tcttatctcc tttagtgttt tacttataaa 
taaaaaatta ttgtatagaa tcccagcacc 
ttgatattat ctgaattaat attatctgaa 
tttgaactct aggaataatt ctatctctct 
ttataatatt ataaagaagt ttgaactaat 
ccactagcag aacaaatttg tcacattctg 
aggattaaat taaaacataa atggaccagt 
cttctcttgc tttactccta caaactcctt 
ataggccctt gttattccaa atttagaaac 
attgcttttg attttgttgg tatagcagta 
aatttccttc aaacccctat gattctggac 
ctcaaaggtg gagacactaa agcaggggtc 
tccatgtcct gttagcaact gggccacaca 
tcctgcctga gctccgcctc ccgtcagatt 
ccaccctatt gtgaactgca tgcgcatgcc 
atctaatgct tgattatccc caacaacccc 
tcttccacaa aactggtcgt tggtgctaaa 
ccatgattgt atagtcaact gaaaaaggca 
tttaggacac tattccctca ttccaaaaca 
gaatcatctg catatgatac acccagttaa 
gccctatatt cttgagtcct gcctgactgg 
aagagagaga gcaaataaga gaaagagaaa 
ccttgattta ttttcatcca tttaagtgaa 
gaaattctac tttggtgacg gaatttattc 
agcccatcct ctttgtactg ttcctgctaa 
ggatgttggt gttgatcagg atagattcac 
ctagtttgtc ctgcttggat ttgtattact 
acttcttctc agacaagaaa gccatttcct 
tcattgctgt ggtgattact gaatattata 
tggccatctg taaccctttg ctttacagca 
tgattgctgg tccatatgtc tatgggtttc 
accacttgac cttctgtggc tccaatatca 
tcatccgact ttcctgctct gacactttca 
gatttaacct ctccagctcc ctcatcataa 
ccatcctgag gatgcgttct gctgaaagta 
acctggtggc agtgactgtg ttttatggaa 
cggacaggtc agtggaacag tccaaagtca 
tgttgaaccc catcatctat agtttgagga 
tgatcagaag aaacgtgctt ttgaagtaaa 
aatctttcta tttatgagaa ctgtatttaa 
agatccccaa tgaaaaacta taatctaaca 
atggttttac atgttacata tgagtgaaat 
tgcatgtatg aggaataatt attcttttat 
tttacatgtg taacaattta catgtaaatt 
tttttacatt ataaaaaatt atacattgtt 
taaaaagtaa gagattaaaa tattttattt 
cttcctagag ttacaacgat taacagtttc 
ctacaaacaa atgtgtggtt atatgtacac 
tgactgtcat ggttgctttt tagtggttta 
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aaatatgttt aatttctctt gaaagttgcc 94140 

ctagctaaaa ctgctcccca ctcttattcc 94200 

ccagttagat tattgtcctt tgatgcttac 94260 

ttgtattttt attattgaac taatcatatt 94320 

aaatcaacca gctctttcaa ggcaagcact 94380 

gcaacttgca tgcatggagt gttcagtgct 94440 

ggttaaatca tttagcatcc tgaaagcatc 94500 

ccttaattca atttccggga agaggttaga 94560 

ttgcaaggaa ttgtatccca ctagctcaag 94620 

aagataaatt taaaaatcga atgaattgcc 94680 

acaatgaatt ctgatattta aaatgtaaag 9474 0 

ctaacataaa gcaaaattct attggcctga 94800 

ctatgtttca cttcagtgga ctcttaaacc 94860 

agggctccca tagacattta agttttgatc 94920 

ataacacaag gcaacattgg gaatttataa 94980 

ataatttccc attgtctcat ttcattacaa 95040 

agtataatct gttcattata atttgtattg 95100 

tgattacttt tcattttatt attctggaca 95160 

gctgctagaa acctgaaaca aatttgtgtt 95220 

ttatatgctg taaacccctt aaatgcattc 95280 

agataaaatc tatgaaataa catttggcaa 9534 0 

cgactgaaaa ttccttcatt agtaattcat 95400 

cttatctttg gtttattcac aaagaaatga 95460 

agaaaaaaaa agtgaagaaa aaaaacccag 95520 

tctctaatac agcacatagc cagtgaaatt 95580 

ccctaacccc caggccatga actggtatgg 9564 0 

gcaggaggca agccaccagc aagtgagcat 95700 

agtcttggca ttagattctc atgggagcac 95760 

agctatctag gttgtgtgct ccttacgaga 9582 0 

accctccaac ccccatccat ggaaaaattg 95880 

aaggttgggg actgttgtta taaaggaagc 9594 0 

gttagtggct taatccattt ttcgacaaac 96000 

cgttggtaac acagaaacca ttctttttag 96060 

ttggggcctt tgctttttta agaccaccag 96120 

ctgtgtgagg gagaacaaga gagaaagaaa 96180 

aaaggtaaca tgtaatggac ttctcattca 96240 

aaatgctggt acctaagaaa atggttagag 96300 

tcttgggatt aaaggatctt ccagagcttc 96360 

tctacctgat cactgtcggg gggaaccttg 96420 

gcctccacac ccccatgtat ttctttcttg 96480 

ccactaatgt gactcccaag atgttggtga 96540 

atgctgcttg tttagtccag tgctattttt 96600 

tgctagctgt aatggcctat gataggtatg 96660 

gcaagatgtc caaagggctc tgtattcgcc 96720 

ttagtggact gatggaaacc atgtggacat 96780 

ttaatcactt ctactgtgct gacccacccc 96840 

ttaaggaaac atccatgttt gtggtagcat 96900 

tcctcatctc ctacatcttc attctcattg 96960 

ggcacaaagc gttctccacc tgcgggtccc 97020 

ccctgttctg catgtacgtt agacctccca 97080 

ttgctgtttt ctacactttt gtaagcccta 97140 

acaaggatgt gaaacaagct ttttggaaac 97200 

atcagtgtat ctttattagt caaataaaaa 97260 

ctttagtagc tttacagtaa agtcaatttt 97320 

acaaaacata gacatagata gagaaaattt 97380 

atgtgtgtgt gaagggtagg gaggtgtata 9744 0 

tcataacatg catacattta catgtatgaa 97500 

tacatgtaaa ttacatttat aacaacatat 97560 

gtaataaaac attaggtctc tatggaaagg 97620 

tctttcccac accctgattt tttattctca 97680 

tttagttttt tttttttaat ttcttcgtat 97740 

acatatatac acgattgtta aaacacaagt 97800 

caattctgta gcaattacta ccaagaaact 97860 
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tcagtctcat ggtttagagg aatgtcacaa ctgctggtga gacttgctgt ccctctaggc 9792 0 
tgtttctcta ggagagaaaa cttacgataa cacctctgtg tccatgcctg acacccactt 97980 
atgtctctcc acttgtgaga gtcactttca ataagacacc tttctccttc tgccacgaca 98040 
ggacatgagg ccagcccagc agttattttt caaatattaa ctctggaagc agacaaagaa 9810 0 
tggctcctat aagggtatat gactggctgc aaataactgt gacttctgca gaacttatca 98160 
attcaattct acacttctgg ggactagagt ccttcatgct ttctctcacc tctgcttttt 98220 
taatctattt ttcaggatag atcatatatt tttattagta gcatcattca aattttactc 98280 
acagtggtac tttgtcacag ttagttatta tatacctaat tttagaatga catatgtcga 98340 
cttgtttgta agtattttag tgctaaagga agcaaaggac aagttgatat tttaaggatg 98400 
tgacaaatct ggtgactttg cagttgataa cataagcaaa attgatgaat gtgtgctgag 98460 
tatctacaat aagtcacact tctcctaaca ccaaggactt ttcaaggagc tcgacaatgg 98520 
catagtttta caggggtcat gaaactgtaa acagaaagat caatacaatt tgttaaatgc 98580 
tgtaagacta tggaggtgct aacgaaagag aggaaagaaa aaattatctc tgggcaaatt 98640 
gaagctagcc ctgtgattgt ccaagactag aacttgttta tgctttatgc accttcctca 98700 
gccctcccat agctcctagt atgacgtaat gtcaggtggg gtctgagcaa aataattaag 98760 
aaccaatttg tgatcaaata tgtagatgtt agtttcaatg aaaaagaaaa aaccataaaa 98820 
ctataacaag tcacaactca aaaagacatg atattttttc tcagttgtat ttcaaagctt 98880 
tctttgtacc ttcgagttaa tctctgctta gtatttctac agaaattttc agactcattc 9894 0 
attttatatt attgcatata attattttaa accttgtact actacatgtt cctatttaaa 99000 
taccttgatt ttgaaagtca gattgcccac aaattctgtt aaaacacaca tataaataaa 99060 
aaagctattt ttactaattt tttacattta attaacaatt gttgaattcc gcttagacac 99120 
atacctagat gtattcctgt ttaaatacac atatgttttt agttatgtga aaccagcttc 99180 
ttaattatat aaaattttct gatctcaggg cccatgcttc tatctgtaca gcctagcttg 9924 0 
gcaccgaaga aaggttcaat gactgcagca tgaaagagta cacatcatac acaattttat 993 00 
taaaagttaa tgaatgtcca ctttcaaagt aaacaaagtg cctcaatttt agaatttata 99360 
agaaaagtgt ctttgaacag accagcagat gactctgagg gaatatgtta ttagtcataa 99420 
gccaacttac ttttcagaaa tacaagtaac cataatgtca ttgatgttta tgactgactt 99480 
tagcaataca aaaccaatag ttacgcccat atacacaaaa gttcaagata gaaaggaatt 99540 
acaccaagtg ctttcatgaa ttgactggct cgctggaatc acgtctatca gcctgttaag 99600 
actatctaag tccatatcat attgtattag acataaaata aatatgtgct gatgaaaatt 99660 
tttggtatta atatgttatt ttagtaggcc ctggatttta gctaaaaagt aaagaataac 99720 
tttcttattg atgacattaa aatcccatta ttaaagtctg tcataagaga tttcaagaag 99780 
agtctccaat ttcatgtctc ttttctttca aagttaattc tgctgaattc tttatttgtt 9984 0 
ggacaaattg ggtaatacaa tgaagaaata atcagaaaaa ataaagaata tttactgtac 99900 
ttttgaaaag aacaataaat atgatatatt attttctcct cacctatcca aaagatcatc 99960 
actctgatat tcctttcagg aaaatttata catgtgtata aaatactaaa tgttataggg 100020 
agtagtgatg ttatgagcct aatgtcttta ctcatataac agatgttttt gggaacaatg 100080 
aataagtgaa aaatatttag acactcacac tcacacatac acacaaaaat aagacagagt 100140 
aatacttttt attatagttc aggtaatatt ttaatgtaag ttgttttact taatctgccc 100200 
atcaacacta tgcattcagt actataatct ccactttaaa gaaaagacaa cagatttaca 100260 
agttgttaac ctaaaagcac ataggtagat tagaagggga gctcaaatat gcctggcttt 100320 
aaagtttatt tgtttgtttg tgttactatt atacttttgc cactatactt cattgaaggc 100380 
aaaatcatac aatgaaaaat ataattattt tctatcttcc agaagtattg ctgatcatgt 100440 
atttgacagc tgcatgcaca ctgaaggata taatgcacta taaaaaattc tttttataat 100500 
gtatagataa atttaacttg ttccataaga caaatttttg cagtcagatc ctaactccat 100560 
catttattag ctaaatgagt agttatgagc atgaactctg gaatctgatt tactgagcaa 100620 
gttatacaac ctctctcggc ctcagtgccc tcttctataa agtgaagaca tagtgatttt 100680 
tcataaggat gctgtcttca cagaagatat taattaacac atgaaacaca cttaggacag 100740 
tactggatac atagtattta ctcaataaaa ggtacctgtt ttaattatta actggaaagt 100800 
tactttacct atcttagact taggttcctt tcataacctt ctatgttttc ttttaataca 100860 
tttattactc tttggatata attaatgtct gtgttctcta taatattgta agtgacatga 100920 
ggtcagaaac aatgatgttt tgagagcaat tattttctat ttttcagcct ggtatgtagt 100980 
atgtgatccc taaaaaatga tagatagatg attgatgggt agatgacaga tagatgatag 101040 
atagatagat agatagaaaa aaagaagtaa aaaaaataat ggagattata atattagttt 101100 
tgtaaggtaa tgggattttc ctgcaaaaca aaacactgat ggtaacttaa atgtgtatat 101160 
tttattcatg tataatcagt ctgcttttgt ctccttcaac agaattcatc tggaaatgtg 101220 
cagaataaat caaaccagag agaagaaatt tatcttcctg aggttggcca gttgcccggc 101280 
gttgcatctt gtcttctttc tcttggttct attaatttgt atttttcttt ttctggaaaa 101340 
cttggcccta aatctgcaca tcacatcctc acactgtctc cttgccaatc tctcaatcct 101400 
acaaattctg gttaactttt tatttggcaa gttcaattcc atagccaatt tatatgaaca 101460 
gcataacatg gtcacattcc cgttaaccca gtgttttgtc cctgcagcca tctgtggcaa 101520 
tcacatccac atggtcatct gccttccatg tagtcacaat attgggtcca aagtgctcca 101580 
cgtttatatg gctgtccttc tctatgtttg catttcatgt aaacttattt aacatggtgt 10164 0 
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tatccccaga gctgacgaca tggaatagca tcatatcctg ccaccttgtg gctactccga 101700 
ccaaataaaa ccaaatttca ccacacgcca tgttgctact ccataatgtg atggaagtgt 101760 
atgagagttc taactttata ttttccaaac aaatttggaa tctttctact ctttgtaagc 101820 
cagaaaaact tctctgtttt tgaatttcac cttatgaata tgtctagata atgttacctc 101880 
cttgaaggaa tcaataacac aaaacagtag ctgccctata tgtttgtgac tctttttgaa 101940 
caaccaaata aatagaaata gaaagaacgt aacgtttgaa aatagtagca tgatggttaa 102000 
ttttatgtgt caacttgact gggcgacagg gtgcacaaat acttggtaaa atgttatttc 102060 
tgggtgtgtc tgtgaatgtg tgtctggaag agattttacc atttgaattg gaagactgag 102120 
taaaggagag tgttctcacc aatgtgggct ggcatcatgc aatctgctga gggcctgaat 102180 
agaatagaaa ggcaaaagaa gagcagattt gttttctctg ggtgagttgg gacatcaatc 102240 
ttcgcctgct cttggacaat gttggtgctc ctggacctcg gacctgcaga ctctgaccag 102300 
gacttaaact atttgttccc ctaccaggtc ttcaaactta gactgaatta aaccaccagc 102360 
tttcctggtt ttccagcctg caggcggcaa atcacaggac tttttggcct ttatagttgc 102420 
atgagccagt ttccattgtg tgtgtgcgtg tgtgtgtgtg taatcatgta atgttatata 102480 
tatatatata tatatatata tatgtaaaat cttcattcta tatatagaat gcttctcttt 102540 
ctctggagaa ccctgatgaa tacagatttt ggtattgaga gtggttctag agcaacataa 102600 
ttttaaggat atattttctt atctgtttct gaggtttctg gaattggctc tttaatttca 102660 
ttagatttaa agatgctaat gactatttag agtagtcgaa aaagcattga tagtccatga 102720 
cataaactgt ttatacatat atgcaaagta tccacattgg attatcctca tcagacactt 102780 
ataagatata aggaactatcf tgactctcta tatacttttg tactttctta gaaaactaag 102840 
gattataacg atattgattg gttactctta atgtcactgg acaaagtgct gaaagataag 102900 
gatgagcttg gggattttaa tttccagtcc aaccctgaga gcttctatgt atgtctggaa 102960 
ggagatcctt gtctcctgta gcttcaatta gagaattgca gaaagggatc ctatatgttg 103020 
cgttgggggt ttgtgggaag atcctgatga agctggaaag actgagcccc taaattctga 103080 
taagtgctat ctgacagtga aagaggtttc cctactccta tactcctagt ggaattggcc 103140 
tccccgttcc cagtggtatc agcttttcca cctatgtctt gggtaattaa ctccatattg 103200 
ccagaataaa tggtaaagac ttcccctgag gcagtttcca tacaagacaa tacagattct 103260 
ttttggggcc cacctttacc acccttcttt gcttctaaac ttataactaa acccaagtac 103320 
caacagcccc tgaaagaaga ggtaccaagc aagaccatga ggtggtgcac taaaaaagta 103380 
gatgagtttt ctaatttata caaacagaaa tccagggaac atgtgtagga atggatacta 103440 
agcatatggt aaaagagtag aatgaagata tagttggatc aggtctaatt tgtcaatatg 103500 
agctcattaa acagagattc tccatttaat gttgcaactt ggggagttag aaaaagctct 103560 
aagtttggtt gtttggctga aacatgaatc aaaagatggt ccactgtgag taaactggaa 103620 
atgtttaact tcccttggtt taatgtagag gaaggaattc aaaggtctag agaaattaaa 103680 
attcaagagt ggacttgcca ttgaagacct actcacccat actggaaggg ttcagaagac 103740 
aagcttttca ccaatacatt gataaataaa cttgtgaggg gaaactgcaa ccacacaatt 103800 
ggaaaatcaa aatgcagtaa tagtaactag atcatggggt ggcagaggcc aggtggtggc 103860 
tttcaatggc ccaaggcaag gtgggcatag ttaccataat gaacagcagg ttaaagcatc 103920 
aaatagaata gtctgaccca cacgtatcta tgacattggc tagcacaggc tagttaatta 103980 
tggtgttccc agttgtataa gtaaatagaa agcccactaa attcttactt gatctgaata 104040 
tgcaaaaaca ttttaggtca agtgaacaaa agtctattct gaattataat aactgatagt 104100 
tacagtccct caatcaattt gtagacttga ggcagtttac agacccagaa ctccttaaat 104160 
gaaggggaga caaggtcttc tcaaggaaga tttccatata ttgctaggtc ccacagagag 104220 
ggacctttta tctgggtaac tgcattggag aaaaggagat aatcagactt ttgagtacta 104280 
cttgacactg gctctgaact gacaatttca ggctcacaac atcattatag ccccttcagt 104340 
cagtaggggc ttatgatgat tagattatca gtgaattcca tctcaaagtg ggcccagtgg 104400 
gtccctgaac tcagcctgtg gttatttctc tggttctaaa atgcataaat agaatagaca 104460 
tactggcaga atccccatgt tggtcctcct acttgtggag cttgggtaaa agaaatagtt 104520 
gaaataattt atttaaagtg ttttgtgagc acaaagaaag aaaaaaatta tcttgacaaa 104580 
aagctttttg ggtaattctc ccagagaagg taaccactga gatgggtttt gaaagttgag 104640 
taggaatttc aaagggtaga ggatgaattg caaagacttt tcctaaagag tgaaccaact 104700 
gatcaagaag aatagcttgg ttgtgcttat gacacaaata atagatagga aagtagtaaa 104760 
taataaagtt gaatagtaaa ctagatcaga taaggcaaga atgaaatgct tttttaacaa 104820 
atttagactt aattggatct acatgatgtg gtgatttgaa cgtatcctca aaaataatgt 104880 
gttggaaagt taatccccac cgtaacagtg ttaaaaagta gggcctaatg ggatgtgatt 104940 
aggtcacaaa agttacaacc ttgtgggatt gataaaaagg ctaaataaag gattactttt 105000 
tttaaaaaag caattataaa tgggcttgag gctgcaagtt ctctaacttg ctgtcccctt 105060 
ttcttgccct ctgccttcct ccatgggctg acacagcatg aaggccttta gatagatgtc 105120 
agcaccatgc attttgactt ctcggttacc agaactataa aacaggtaca tttctattta 105180 
ctataaatta cttagtctgt gttattctgt tgtagcagca taaaacagac taagagaaat 105240 
ttggtactca gaagtgtggc tgttctatat cagctgaaca tgtgaaagtg gccttagaac 105300 
tgagtaatga gtagaggcca aaagaatttg caggaacagt ctagaaaaag cctagaatgc 105360 
catgagtgga ggtttaaggg tgattatggt gagggccagg aaatgtaggg aaagtctgga 105420 
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acttcttaaa gactacatgt gtggctatca 
aggccattct gaaaagatct caggtgaaac 
aggccatctt ttttatacag tggcaaataa 
tctgtggaat acagaactta agagtggtga 
gaagcaaaaa attaaaggtg gtgcacagct 
agaaagaaaa aaaagataga atttgtaatg 
actcttagcc tggccatgta aataataaaa 
tgttcatgta ggatacagta aattcctctt 
gaaaaattgt taagtacaaa gagttctgag 
gtatgtttag cttccctgtt ctttgttctc 
catctccttg cccctagttt cagtaaacaa 
ccttagtcat cctcagtcac ctgctctgtc 
gtccttcccg ccgaaactac tcaccctgcc 
atagccagtc agaattagct tagactgtgt 
gcagtaggga ctagctgctt taggataagc 
tcgccatcgc tccatccttg agatgcaccc 
aacttttgcc tgagtgctag tttcactttg 
ttgctacaga gatgaacata aatagaagga 
ggatgaccct gaaagcattt caaagatctt 
atccagagag atacataaat atatgtttag 
gtatgtagta caatcatata ttttaacgat 
aactaatgga ctagcttgtg aaaacaggat 
aaacagggtt tctcagtaga actaaactgc 
cgtaataaat ttaccagata ttgcttccgt 
ttgtaatcaa tgtttaatca tgaaattaga 
catgttagtg gctagagaga ggagattcat 
tcactcaaca gctttttgaa tttgagaaaa 
caatatataa tgggtctaat aatatttatt 
cataaaaagt tctgtacatt gaatttataa 
cccaaatttt aacatctgta tgtggtacat 
attatcctaa tcatttgcac tctagatttc 
tctacaaaca aagcaacaca aaataaacaa 
gacttttttt tttttttcaa atgaatgtat 
atgtttttgc tgcattagtg gccaagtgaa 
cagttctttc tgccaagtaa tggagggagt 
aagcaataag taaagtcatt accaagagat 
gccattttaa aatccatgtt caacatcact 
gaaatgactg cttctttgtg atttcttttt 
aaacacaagc aaacctttcc ttatttatta 
tttataggta gggctgtcta tttgtgtgtg 
ttatatttag aagccaaata cttattcagt 
taacttcata ggagtatggg tggttatata 
atttgttcta actactgaac aaatgggact 
tgtgcctcac atgttgtcag atctttacat 
tccttcaaca acattttgaa gtgagaaaac 
acaattgtag agctaacatg cagcatagtt 
caaaactttt gagttggttt tgggagatga 
tagcacaaaa taattttaat gttttggaaa 
ggaaggtggg atgtaaggtt aggacatgaa 
tgttctccag aaaaaaagat gaagaaaaga 
gaaaacatat gcatgcacac tgaaataatt 
tacacatgtg tgtattatat atgtctatgt 
ctttatatac atagtcaggg gaagagagag 
atagttccta ttacatgaac agccttttag 
gcaaagttgg agctagagta gaatttgaga 
aataaaaata tagagaaaaa tacagttttt 
gtgagaaata ggaggggagt agaattaaga 
agtgctgtaa aataaccggt gacaaatcct 
gaaatataat agatttaatg ttttaatttt 
atcttctact tgccttccac agagttgcaa 
gtaaacatta cagatttcta aatggcgtgg 
aggtctcaag ttgctgacat caggattatc 
accatagtcc ttgaatataa tctctcacag 
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tcaaagtgct ggtagaaata ggggtagcaa 105480 
tgagaaacaa tgtactggaa actagagtaa 105540 
attggcagaa ttgtgtccat gtagcagggc 105600 
gctagactat ctggaagaag aaatatctgc 105660 
tattttgggt gcttacagta aaatgagaga 105720 
aaaaggaagc agcatgaagc gatttagaaa 105780 
aagcatgttt gggacagaat actaaggaag 105840 
caaagtttag cctgttaact tcaagaagga 105900 
ttccacttca aagaaccaat caatatgtca 105960 
cattttaaag tttaacctcc tcgttcttta 106020 
ccccctccta gcctctatca cctgttctgt 106080 
cttagttatc tagtcatcta cactgtaacc 106140 
actctggctc atacccctgc tctctttaaa 106200 
ggtccaaccc cagccaatag gggaaagaca 106260 
acctcttccc ctcccttgta cggtgtgctc 106320 
ttctatagaa gtaaattgcc ttggtgagaa 106380 
tagcaccgaa catttacttc caacaatcat 106440 
atccaggttg gatttatcaa gacaatggga 106500 
gtggaaaagc tgggattttt agggcaaaga 106560 
tttttaactg gaagtgagct gggatgtact 106620 
gtggaaactg aaatagggtg actagcctaa 106680 
ttctcgtctc aaaaaaaaaa aaaaaaaaga 106740 
ctttccttgg ttttatttct agttaaatgt 106800 
gtacctggaa agcatagaca ttttgattca 106860 
tgaaataaca gcaagtgaag catgttagtg 106920 
agtcaaagct gctttcttct ctcagtgagc 106980 
ttgaattttc tttctaaggc tcagttcctt 107040 
tcatattttt aatatttaca taaaattata 107100 
aaacttttta ttttaagtat acttaattac 107160 
catggcctag gaaaataaag aataataaaa 107220 
ttcaattata ttcatttact cttggacttt 107280 
agaaaaagaa ggggagcaaa aataccctat 10734 0 
gctcaccgta atgataccgt gtggtagaac 107400 
aagagtgagt aaggtgttcc tttttgagtg 107460 
gaggtagaac gatgcccagt gatacacgat 107520 
gtgaaaaaga ggacacagaa gagatgttaa 107580 
gagctgtctt cagggaacat ttaattggca 107640 
aatgcaacag ggaaaaggag tcaactttat 107700 
tctctgaagg cccattggtt ttcctgcttc 107760 
tgtgttagga actagacaaa gtgttgctat 107820 
agaaatgatg catgattaga aatagttaaa 107880 
ggtatcaatg cacacaatct gagaatttgt 10794 0 
aagaaagtaa aatttcttgc aatggcttta 108000 
agattataac ttttaatcct caaaagatat 108060 
taaagctcaa aaaagttatg taagctgtct 108120 
gtaatttttt aagtcaaata aggcaagttc 108180 
ataccagtac taaggatgtt ggaaacattt 108240 
gtacaaatgt acatggcaga ctatttctaa 108300 
tatatagaag agtgcaagat gatgttccag 108360 
ttgcaaagca tctaactacc caatcctcat 108420 
aaaaactaca tgtgtatgca tgcatgcatg 108480 
gtatacatac aatatacata gtcagggaga 10854 0 
agaggttaaa tcaccatcct tctaaagata 108600 
ttcgtcaaag caccttaatg tgttcactag 108660 
ttaggagacc ccaaaaggtg aagaacttag 108720 
gaggtcattt tagaagagaa ggaaggagag 108780 
gaaaccaggg aggtatcatg aattgggtta 108840 
ctatgattag gcaacaggta gaaaatttga 108900 
aatttttatt ctcttttgca acaaatatct 108960 
tattatctgt gtctgtgtgt ttgcatactt 109020 
aatagatagg agttttaaga tgattttgaa 109080 
taagcattat ccttacattt tataagtcat 109140 
attaatttaa ttctaactaa taacaatcac 109200 
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ttcatatgtt ttcatcttat atagcagtca 
ttaagagtat ttatattagt tagataacgg 
acagttaaac aaattagaag tttatttctc 
tatcagctct gccaaggttt taggagctcc 
ataaagagtt actctcatct atatgattca 
ccagtgagag gcagaaaaga agaagggaag 
aagtagctta attttctttt attacattct 
aaatctaatt ttagaatttt aaaaaaagga 
ggctttgttt tttaaaattt atttaactta 
agttctggct ctgaggagca aatacttaac 
gcaggctcag gctctgtcaa atctgtctgt 
tctaatcact tcgctatgca tgctcagtat 
gtcatttcca tgctctaatt caggtctgtc 
tgaaattaac tagaagcatg ttagtagaca 
tggtatcctc ctcaaatatt tatcttttat 
tagcatattc tgtttgtcac ataaagaatg 
ctatccttat aacaggaacc taagggattt 
aatcagaaaa agaatgtcag gtaccaggtt 
agctaagtta tctaaacaat taatttacaa 
ctagtggcaa aatctgagca ataaagatgt 
agacacagta gaaatgaata tgagtaagaa 
cctgtttcaa ttaaatcact atctctagtg 
tgcagaggag caatgcaaaa caatgtttca 
aatatctact aatccaacat tctcatttcc 
cattggtgac tgagttcatt ctcctgggac 
tcttcacgct gtttctcacc atttacatgg 
ccctcatcca ggccaacgcc ccggctccac 
tcctttgtgg atctgtgctt ctcttccaat 
tcagagaaga aaagcatttc ctatcctgcc 
ttggtccacg ttgagctcta catcctggct 
tgcaaccctc tgctttatgg cagcagaatg 
gtgctttatg tgtatggagc actcactggc 
gccttctgtg gccccagtga aattaatcac 
ctggcttgtt ctgacaccta caacaaggag 
ttcacttatc ctctccttat catcctcatt 
aggatctgct ctacagaagg caggcacaaa 
gccgttacta ttttctattc agctcttttc 
tccatggagc aggggaaaat ggtagctgta 
cccatgatct acagtctgag gaacaaagat 
aaaagaaaat tgttttctaa ataaacatta 
ttaccctatg atttttcata gagcatagtc 
tcctagaagt gttggggaat ttttatgaag 
tcaattaaca ggaattttga attcaatata 
gctctatagt gaagaatcaa tgtaaatttt 
aaatccttaa aaccagacct gggttttctt 
ctgcttctta aacatatatt aagctttttt 
caaaaagaaa acagcatctc actccatcac 
cacagcttac tgcaatcttg atctctggtg 
cctctgccat agtagttggg actacaggtg 
ttttttttgt agagacggga tttcaccatg 
ccagtaatct acctgcctgg gcttcccaaa 
cccagcctat aataagcttt aagaaaaata 
tgtgtcttac aaagaaagtc aacattccac 
catcccttgt acactcagct gtcatctcag 
ccagccaaaa acacaccaac actttggtcc 
catacctggt ttatttcttg gctttttaga 
ttgaatgtta cctaaaagtc tgtctaagga 
gcatgtctaa aaccacatca acccaaaagg 
cacagatttg catgggatca ctgtcttcca 
ggcctatagt tcccctagag gtaacctaga 
agagactaat gatgaggtgt gtgcaaggtg 
ttcggattat tgaagggcta caacctctag 
aggatgaaac tagtccgttt agacttggtt 
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agtttacaat aatcttatta agcaccaacc 109260 
attaaattcc tatgagaaca aaacaaaata 109320 
atttatgtaa aattctgagc tagtatgtca 109380 
aactaattta ctcttctttc tctgctttct 109440 
ggatggctca gcactatatc cacattacag 109500 
ggcaaacctg ttttatcaat ggatggataa 109560 
cttggccaga agctaccctt ttgccatagc 109620 
aattctggtc atttttgttt tgttttcttt 109680 
tttattttta agagctgtcc cataattatc 109740 
ctggttaagg gaaaggtact agcttctagt 109800 
tgtctggatg tctggatgat cacttagcca 109860 
gcacaaatat agacagacat gttggacaaa 109920 
gctccaaaaa attacctact ctgaagttta 109980 
ttttgaaaat ttttctgtaa aattttcaat 110040 
tttcccccaa aagtagcagc tagtctaaat 110100 
agctaaaatg tatttttagt tttttactac 110160 
tatctgacat ggcatggaag aattaactgt 110220 
ctcaacttac agttaagtta gataacttat 110280 
aagtgctgag tattcatttg tttattagca 110340 
gagataaaaa agcaaaatgt aaaaaaaaat 1104 00 
attaagtgac ataaaccgaa aagattagac 110460 
tctcctgaca caaaacaaat ttagaaaaac 110520 
agaaaatata aaaatgtttg ccctttagaa 110580 
cactgaagga aattatgaga agaaactgta 110640 
tggccaatca ccgggaatta cagattttcc 110700 
tcacggtggc aggaaatctt ggcatgattg 110760 
acgcccatgt actttttcct gagcaactta 110820 
gtgactccaa ggatgctgga gattttcctt 110880 
cgtcttgtgc agtgttacct ttttatcacc 110940 
gtgatggcct ttgaccggta catggccatc 111000 
tccaagagcg tgtgctcttt cctcatcaca 111060 
ctgatggaga ctatgtggac ctacaaccta 111120 
ttctactgtg tggacccacc actgattaag 111180 
gtgtcaatgt ttgttgtggc tggtttcaac 111240 
tcctatctct acatatttcc tgccacccta 111300 
gctttttcta cctgtggctc ccatctgaca 111360 
ttcatgtatc tcagacgtcc atcagaagag 111420 
ttttatacca ctgtaatccc catgttgaat 111480 
gtgaaagagg cattatgcaa agaactgttc 111540 
ctactgattt ttgttgtgtt gtcattttat 111600 
tgatacaaat atatccaaaa ttatatattt 111660 
tgtgaataca agaaaaaatg agtatttaca 111720 
gttttgcctg cccatacccc aaagtaagaa 111780 
aaaaaatcta atttatactt tagagaaaaa 111840 
tttgcaagga gttaggcata tagttttaga 111900 
taaaaaaaac aacaacaaca acagcaacaa 111960 
ccaggatgga cctgagtgca gcggtgtgat 112020 
ctcaaatcat cctcccacct cagcccctca 112080 
tacaccatga tacctgacta atttttgtgt 112140 
ctgcccagac tggtcttgaa ttgctgagct 112200 
atttaggaat tacaggtgta agccactctg 112260 
gcagttctgc taaactctac taaatacttt 112320 
tttaccatat tcaacctaaa atatgagcct 112380 
ctgttgcctc tgcctgtacc aacctcatct 112440 
acagtgaggc acacttatgt ttccacagtc 112500 
ggaaaaataa atggctagta gacttctaaa 112560 
gatccaaggc taacagacaa tctgttcaat 112620 
tgtaacatgt ggttgtaatg atactatcat 112680 
gtcttgaaga tgtgtgtttt tgggtgtgga 112740 
acttaagata taatcacata caaggaagcg 112800 
agaaattaga acacagatgc cctcatgaaa 112860 
agtaggacag agtcaacaac agcaacaaca 112920 
tcatggagga gttaaaaagt cttatttaag 112980 
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aatttatgac aataattaat gctcatgtta 
gggctatcaa atctcaagct aaataatgaa 
ttcatgacac tggaataagc aaacataaac 
agacttcaaa gatagcccac tgaaaaagtt 
gaaataaaaa gcacaaatga atatctttaa 
aaattgtaaa ataaagatat ttaatcattg 
ccatgtatat atgtgtgtac atatgtatgt 
gaaaccaata attatatata atatgtagag 
tctctgtgtt ttataggcat tgaagcattt 
cacattgaaa ggaagaggaa gaaacagcta 
ggagagtcag cctttctctt cttcttcctt 
atcactacaa tcctgcatca gtgagtcctc 
cactcatcca ccaagttcag agaggtggcc 
attcactttg tcatctgcct gacattggcc 
aaaggcctta ttcttttcaa aggattttgt 
cagaaataaa atatttcagg attgacctgc 
cattgcatgt ataagtctaa ttgtgtgaca 
ccctttagga ctatctcctt gtctcttcag 
tgtgaggctc tgatacaaag agtaggcagg 
cagccctagc acatccttaa agcacccctg 
ttcagcatct acaggttaac ctccatgggc 
aaccacaaca cagagtacag ggaaaacagc 
ttaagatatc accctgtctt cctatttgaa 
ggccagatct tccaatactg tgtttaatag 
gtcacttttc aaagggaatg tttccagctt 
tttgtcataa atagctctta ttattttgag 
agttttcagc atgaaggggt gttgaatttc 
gataatcatg tggattttgt cattcgttct 
gggtatgtta agttagcctt gcatcccagc 
ctttttgatg tgctgttgga tttggtttgc 
gttcatcagg gatactggcc tgaaattttc 
tatcaggatg acgctggcct cataaaatga 
ttggaatagt ttcagaagga atggtaccag 
tgtgattcca tctggtcctg ggtttttttt 
ttcagaactt gttattgatc tatccaggga 
tgactgtata tttagaaaac cccattgtct 
tcttcagcaa agtcttagga tacaaaatca 
ccaataacag acaaacagag agacaaatca 
agagaataaa atgcctagga atccaactta 
actacaaacc accactcaag gaaataagag 
gctcatggat aggaagaatc aatatcatga 
gattcaatgc tatccctgtc aagctaccat 
ttgtaaattt catatgaaac tgaaaaagaa 
agaactaagc tggaagcatc atactacctg 
caaaaacagc atgatactgg taccaaaaca 
cctcagaaat gatgccacac atctacaacc 
agcaatagga taaaggattc cctgtttaat 
tgcagaaaac tgaagctgga ccccttcctt 
ataaaagact taaacgtaag acctaaagcc 
accattcagg acataggaac aggcaaagac 
acagaagcca aaactggcaa atgggatctc 
gaaactatca tcagagtgaa caggcaacct 
ccatccagaa tctataagga acttaaacaa 
aaaaagtggg caaaagatat gaacagacac 
aaacatgtga aaaaaacctc atcatcactg 
caatgagata ccatctcacg ccagttagaa 
gatgctggag aggacgtgga gaaacaggaa 
tagttcaacc attgtggaag acagtgtggt 
catttgaccc aacaatccca ttactgggtt 
ataaagacac atgcacatgt atgttcattg 
accaacacaa atgcccatca accatagact 
tggcacacta tgcagacata aaagaggatg 
acctggaaac catcattctc agcaaactaa 
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agtcagcatt tatatttata ctgctttgat 113040 
gaacataaag aggcccaggt tagaagaagc 113100 
ataagtcaga gaatgggtta tcaaccactg 113160 
ctaattcaca gaaattttcc aggaaaataa 113220 
tttatttcag atattaggat tacttgatgc 113280 
aaaataaatg tgaaaattta aactctaaga 113340 
gtacatatat atatgcaaag actgattata 113400 
ttccaagcat atctgcaggc tgaaggagag 113460 
ttagagtgga taggcaggta tataaatttt 113520 
agtccctctc atgaggcttt aagtggtctt 113580 
agtaaatttc atgcagacct tggcagccta 113640 
tgaactggca ccatagttca ctaggaaagg 113700 
atccgcattg ctaaacagaa taaaacaaaa 113760 
ttgctataag atagcctcat aaactgcctg 113820 
gctgtcttca agtgatattt ttctccatgt 113880 
ctcaaatttt agtttgtctt ggttattggc 113940 
gaagtttccg atactctgct tcatgaagta 114000 
ttcttcaatc aatttttcct gaagttcttg 114060 
tgctggcatc cagcccagag ccaccctttc 114120 
accagcaccc tcaacctggc caacaggttt 114180 
ttgagctgca ccttcacaaa tcacaagaga 114240 
ttttcaaaat tggtaaggtt agtagtggtg 1143 CO 
taccctattt tttttttgcc tgattgccct 114360 
gagtggtgag agagggcatc cttgtcttgt 114420 
ttgcccattc agtatgatat tggctgtgag 114480 
atacattcca tcaataccta gtttattgag 114540 
atagaacgcc tttttttctt catctattga 114600 
gtttatgtga tggattatgt ttattgactt 114660 
gataaagcca acttgatcgt ggtggataag 114720 
cagtatttta ttgaggatat ttgcatcgat 114780 
tttttttgtt gtgtctctgc caggttttga 114840 
gttagggagg agtccctctt tttctgttat 114900 
ctcctgtttg tacctctggt agaattcggc 114 960 
gattggtagg ctattaatta ttgccacaat 115020 
tttggcttct tccttgtttt ggagatgaca 115080 
cagccccaaa tctccttaag ctgatgaaca 115140 
atgtgcaaaa atcacaagca ttcctataca 115200 
tgagtgaatt cccattcaca attgctacaa 115260 
caagggatgt gaaggacctc ttcaaggaga 115320 
aggacacaaa cagatggaaa aacattccat 115380 
aaatggtcaa actgcccaaa gtaatttatt 115440 
tgactttctt cacagaatta gaaaaatctc 115500 
cccatatagc caagacaatc ctaagcaaaa 115560 
acttcaaact atactacaag gctacagtaa 115620 
catatataga ccaatggaac agaacagagg 115680 
atctgatctt tgacaaacct ggcaaaagca 115740 
aaatgatgtt gggaaaactg gctagccata 115800 
atgccttaga caaaaattaa ctcaagaggg 115860 
acaaaaaccc tagaagaaaa cctaggcaat 115920 
ttcatgacta aaacatcaaa agcaatggca 115980 
actaaactaa agagcttctg cacagcaaag 116040 
acaaaatggg agaaaatttt tgcaatctat 116100 
atttacagga aataaacaaa cgaccccatc 116160 
ttcacagaag aagacattta tgtgggcaat 116220 
gtcattagag aaatgcacat caaaaaaaca 116280 
tggcgatcat gaaaaagtca ggaaacaaca 116340 
tgcttttaca ttgttagtgg gagtataaat 116400 
gattcctcga ggatctagaa ccggaaatac 116460 
acatacccaa aagattataa atattctact 116520 
cagcactatt cacaatagca aagacttgaa 116580 
ggataaagaa aatatggcac atatacacca 116640 
agttcatgtc ctatgcaggg aaatggatga 116700 
cacaggaaca gaaaacaaaa cactgcatgt 116760 
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tgttctcact tataagtggg tgtgaacaat gagaacacat ggacacagga aggggaatat 116820 
cacacaccag ggcttgtcag gggttcgggg gctagggaag ggatagcact aggagaaata 116880 
cttaatgtag attacaggtt gatgggtgca gcaaaccacc atagcacata tatacctatg 116940 
taacaaacct gcaggttctg cacatgtatc ccagaactta aagtgtaata aaacaaatgt 117000 
atgtcacctt gttaaaaaac agatagtaga aaaaccagca aaatgagaac tggtgctata 117060 
taaaaatgta ttcaaagaac caagtaacat atgactttgt gctcaacttt aataatcatt 117120 
agggaagtac acattacaat tacaagatat ttgtgttttt atatatctgt acctatatgt 117180 
tgctaaatga caaaaagaaa aaaatgatat tgctaagtac tggtgggact atggaacaat 117240 
tgaatctctc acataatgtt agtgagaata caaatggata caaccagttt ggaaaactat 117300 
taggcagcat aaatttaagc tggacatatt tatactccat aagccagcac ttctactcat 117360 
agttacagca gtcagacatg catgaatatg ttcatgaaga gatatgcaca agactgaatt 117420 
ttgtaatagc cacatctgga aataacctat atataaagtt tggtgtactt attatacagc 117480 
agtgcagatg aaggaattat ttacatattc atcacactca tcatcatgaa taaaaatcac 117540 
caacataata tttagcacca atgtagtgtt cattggagac tagacctaac aaaatatgca 117600 
ctgtataatt tctattacag aaaattcaaa accagacaca attaatctac agtgttaaaa 117660 
gcaaagatat agttaatttg aggttactga ctaggaaagg gcatgaagac agatggtgga 117720 
gtactggtaa tcttctattt gagaatttag aagcaagtaa tacaagtatg ttcactttgt 117780 
aaaaattcat gaagcaggcc gggcacggtg gctcgtgcct gtaatcccag gactttggga 117840 
ggccgaggca ggtgaatcac gaggtcagga ggttgagacc atcctggcca gcacagtgaa 117900 
accctgtctc tactaaaaat acaaaaaatt agccaggcgt ggtggcgggc acccgtagtc 117960 
ccagctacta gggaggctga ggcaggagaa tggcatgaac ccggaggcgg agcttgcagt 118020 
gagccgagat cccaccactg cactccagcc tgggcaacag agcgacactc tgactcaaaa 118080 
aaaaaaaaaa aatcataatt catgaagctg tacacttacg ttttgtgctc attatttgtg 118140 
tgtaaattag acttcaatat aaagcttact aaaaacgaat aaaaatagta ctagtcttca 118200 
agcaagcaaa gcttcattcc aatatcaaag cattctattt acctatcagt acacagaggg 118260 
tattagtttg ctagggctgc cacaaataag taccatgaac ttggtgactt aaacatgcag 118320 
atttatttcc tcacagttct agaggctaga agtccaagat caaggtgtgg gcaaaattgg 118380 
tttcattctg agttctcttt ctggcttgtg gatgatcatc ttatcccgac ctctttacac 118440 
tttatttttc tgtgtgtatc tgtattctaa tctcttctta taaggatgca agttatattg 118500 
gattagggca cagctcaccc attaggcttt attttactaa atgttctctt tagatattgt 118560 
gtctccaaca gtcatgctct gtggtgcttg gagttagaac ttcagcatat gaattttgga 118620 
gagggaggga aggggcacaa tccagtccat aacacagatt aagaaacgtg aaaggctaat 118680 
agaagtttga cacaaagttt gtgacactag tacgagagaa actgtatcag aaaagttgaa 118740 
ttaagttgaa agtaacatgg taaacctaag gcaatgtgag aatccatggc agacatgaat 118800 
gttatcttat ggattttcca atgtaagaag gaaaatactg agaatgaaac ataaggcaga 118860 
gaaggaccag agagttgtga gttcccattt taaatttgtg ttgtgccaaa tgtcatatct 118920 
ctagagaaat tattcagtga gaaaaaaaat ctgacagagt aattgcttca tttttgcata 118980 
tctgtgaaat cccttaggga aataaagtca tcatacaaat attataaatt attcctgtat 119040 
ttgtcaccag aaaagccatt tgatattctt tgtaaggata gctcttccct tattcataaa 119100 
taagtttctg catgtgtttg taatcctgga acacttgttg tacaatcata tgtattttca 119160 
gagttagata tatgattgtg atgattaaat gactaggtag aaagaaaaat gccaattacc 119220 
agaaaaatgt agacacttag catttaaggt acttttattt gttaaagtct tgaataatga 119280 
ggatggaagt taatggcata aaaatataag aggcatgctc taggatcttt cactcaatat 119340 
aaatgaaagc taatatttat taagggtttg ccacacattg ggcacagtgc tatgcatatc 119400 
acatacccca ttttgtgaaa tccgaaaaag agtgcttttg tatttgctaa tttcatctgt 119460 
aatagaaaaa aattatagtc cagaattatt aagaaacatg acccagacta ctcagatcag 119520 
aagttctgac atcagaatgt gaactcaacc agtcgactcc caaacgtatg tttctaccag 119580 
tacagtatgc tttatggttt gtagtggaat ttctttctgt actaaccatg agggaaatat 119640 
gttattatcc atatctatta taggcaaaat gtcatagaat tggtttgagg gttgaatgtg 119700 
ttaaaactta taaaatagat tagtgtttgg cttataagaa acaccatgta agtgctggtt 119760 
aaattagtgg taaaactaaa acacagaata aggaacatgt caaaagaaca gagcagcatt 119620 
tcagaaatat ctaactccag atcctgtgaa ttgattttat gctaaggcta tcatattttt 119880 
atcaaggcat tcatctttgt gtttggtcta gtcctaaaac ttaagaatgt caaccggatg 119940 
tgtgcatttg tcaaatacac aaagttgggc actttaaata tatgaatttc actgtatata 120000 
aattattgca taaacaacat taaacattaa acaaagaaac aaaggtgaaa tctgacagga 120060 
gcttgatata taaaaatgaa tgaaggtatg tagggaaaag aaagagagat cagactgtta 120120 
ctgtgtctgt gtagaaagga aagacataag agactccatt ttgaaaaaga cctgtacttt 120180 
aaacaattgc tttgctgaga tgttgttaat ttgtagcttt gccccagcca ctttgaccca 120240 
accactttga cccaacctgg agctcacaaa aacatgtgtt gtatgaaatc aaggtttaag 120300 
tgatctaggg ctgtgcagga catgccttgt taacaaaatg tttacaagca gtatacattg 120360 
gtaaaagtca tcgccattct ctagtcttga taaaccagga gcacaatgca ctgtggaaag 120420 
ccgcagggac ctctgccctt gaaagcggag tattgtccaa ggtttctccc catgtgatag 120480 
tctgaaatat ggtctgaaac caggggcaca ttgcactgcg gaaagccgta gggacctctg 120540 
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cccttgaaag cggggtattg tccaaggttt 
tgtggcatga gaaagacctg accatccccc 
ggaggattag taaaagagga aagcctcttg 
tgcctgcccc tggggactga atgtctcggt 
tgagatcaga taaaaactgc cctatggtgg 
ttgttattct ttactccact gagatgtttg 
gcatgtccag tcatagtacc ttcccttgaa 
atgtttgttg ctgaccttct tattatcacc 
aataatgaag ataataatta ataaaaactg 
atatgctgag tgccggtccc ctgggcccat 
ttatttcttt tctcagtctt tcatcccacc 
caggccaccc cttcaaaggt acatcactac 
actcgatgtt gtataaatca caaataaaat 
tataagatgt tcttgtgaat atagatgatt 
ttaatatcag tgttaacagt ttttgttgtt 
tgctaagcaa aaccggaaca ttctgaaagg 
aaaagaccag gcttagagaa cagatagcag 
aaccacacag aatcagtgtt tctttgcaac 
gaattagatg aaaaaaaatt ttgtcaaact 
gagaggctag catctgaccc ctgcttaagc 
ggaaactaga agtggaagat caagaggaca 
ggaaaatttc ctccaaaaaa ttcaagtgga 
aaccctcgaa acgcaaatat gtagatattt 
cctggaaaat ctctattgca ttgagtctgg 
ttgtttgctc aggcttctcc aagaggggat 
agatcattca gtcaatattg ttctttgata 
tctgccagaa ttataatcag gaataaagaa 
gcaagtccac ctataaagta catttctcaa 
aaagaagaga agaagaatag agacagaagc 
agggacataa aacaagaaaa ataaagggca 
tatccctact ttagtttacc tgaaacaaag 
gaaaagaggg gaaaactttt ggtaagaaga 
tcatgttgct attgtattgc aatcaatata 
aattatgaga aggaacttca cgttggtgac 
ccaggaatta cagattctcc tcttcatgct 
agggaatctt agcatgattg ccctcatcca 
ctttttcctg agccacttat ccttcctgga 
gatgctggag attttccttt cagagaagaa 
gtgttacctt tatatcatct tggtacacgt 
tgactagtac atggccatct gaaaccctct 
gtgttccttc ctcatcacgg tgccttatgt 
catgtggacc tacaacctag ccttctgtgg 
agacccacca ctgattaagc tggcttgttc 
tgttgtggct ggctggaatc tttcgttttc 
catttttcct gctatcttaa ggattcgctc 
ctgtggctcc catctgacag ctgttactat 
cagacctcca tcagaagagt ccatggagca 
tgtgatcccc atgttaatcc catgatctac 
ttatccaaag aactgttcaa aagaaaattg 
gtcatgctgt cattttattt agcctataat 
tatccaaaat tatacatttt ccttgaaggg 
gaaatatgag tatttacatc aattaacaga 
cataccccaa agtaaaaatt tctatagtga 
ttgtatttta gaaaataaaa cccttaaaac 
aggcatatag ttttaaactg cttcttaaac 
agcatctcac tccatcaccc aggatggact 
caaccttgat ctcagggttt caaaccatcc 
tagctgggac tacaggtgtg caccaccata 
cggggtttca tcatgttgcc cagactggtc 
cctgggcctc caaaagagtt tgcattacat 
ggtttaaaaa agagcacttc tgctaaactc 
aagtcagcat tccactttac catgtccaac 
cagttgtcat ctcagctgtt tcctctgcct 
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ctccccatgt gatagtctga aatatggcct 120600 
agcccgacac ccgtaaaggg tctgtgatga 120660 
cagttgagat agaggaaggc cactgtctcc 120720 
ataaaacccg atagtacatt tgttcaattc 120780 
gaggtgagac acgtttgcag caatgctgcc 120840 
ggtggagaga aacataaatc tggcttatgt 120900 
cttaattatg acatagattc tattgctcac 120960 
ctgccctcct actacattcc tttttgctga 121020 
agggaactca gaggctggtg caggtccttg 121080 
tgttgtttct ctgtactttg tctctgtgtc 121140 
caactagaaa tacccacagg tgtggagggg 121200 
tatgattgaa tagatgtaga cacagctttt 121260 
ctttgtttca gatattcaaa aataaccttc 121320 
tttatataga aaaatagaag gtacatttca 1213 80 
gttgtaagta acagaacaca gctctggtta 121440 
atgttagggc tcctaaatca atgtgcggtt 121500 
cctaggcagg tgtggaggct gagcagatca 121560 
atcagcctga tcttcaaacc tcactgtaat 121620 
tgagtcatta actcactccc ttaagacaag 121680 
ctttctaaac agagaatctc atctctcagt 121740 
gaatttccat gactctcaac ttgacacatt 121800 
gtgttaatgg ctggaattta atagctgtgc 121860 
ttcaaaataa ggggaacaaa tcatgccctg 121920 
aggaagaaga aagcaacaag tgagccccgc 121980 
tgaactggtg aaatctattc agaaataaca 122040 
tcttgttctt tcctctctct aaactcattc 122100 
attggttagt tttaaatgtt atgttttggg 122160 
atataaagag gggcagaata agagaaggag 122220 
tgaattagaa gggaatgaaa agaaagacaa 122280 
gtatcaaaaa aggcagagag ctgctttgaa 122340 
ttcattagaa ataaaatgag gaaaactgct 1224 00 
ttattactct actggtaata atagtaataa 122460 
tatgtaaagt tcctttctcc cactgaagga 122520 
tgagttcatt ctcctgggac tgacgaatca 122580 
gtttctggcc atttacatgg tcacagtggc 122640 
ggccaatgcc cggctccaca cgcccatgta 122700 
tctgtgcttc tcttccaatg tgaccccaaa 122760 
aagcatttcc tatcctgcct gtcttgttca 122820 
tgagatctac atcctggctg tgatggcctt 122880 
gctttatggc agcaaaatgt ccaaaagtgt 122940 
gtatggagcg ctcactggcc tgatggagac 123000 
ccccaacgaa attaatcact tctactgtgc 123060 
tgacacctac aacaaggagt tgtcaatgtt 123120 
tctcttcatc atatttattt cctactttta 123180 
tacagagggc aggcaaaaag ctttttctac 123240 
tttctatgca actctgttct tcatgtgtct 123300 
aggacaaatg gtagctgtac tttataccac 123360 
agtctgagga acaaggatgt gaaaaaggct 123420 
tttcctaaat aaacatcagt attgattttt 123480 
tttttcatag agcttagtgc aatacaaatt 123540 
ttagggaatt tttatgaagt gtgaataaaa 123600 
tagtttgata tcaatatagt tttaactgcc 123660 
ggaatcaatg taaataaaaa aaaatctaat 123720 
cagacctggg ttttcttttt ggaaggagtt 123780 
atatattaag ctattttttt tttttgaaac 123840 
ggagtgcagt ggcatgatca cagcttacag 123900 
tcccacttac acccctcacc cctgccacag 123960 
ccacactaat ttttgtattt tttgtagaga 124020 
ttgaactgct gagtgcaagt gatctatctg 124080 
gggtgagcca ctgtgcccag cctgtaataa 124140 
tactagatac ttttgcctat cttacagagg 124200 
ctaaaacgta agcctcatcc cttgcacact 124260 
gtaccaacct catctccagc ctaaaacaca 124320 



WO 00/21985 



PCT7IB99/01729 



ccaatgcttt ggtccacagt gaggcacact 
tcttggcttt ttagaggaaa aataaatgac 
aagtctgtct aaggagaccc aaggctaaca 
catcaaccca aaaggtataa catgtggttg 
gtggaggcct atagttcccc tagaggtaac 
aagacgagag actaatgatg aggtgtgtgc 
aggaaattcg gattattgaa gggctgcaac 
acaacaagga tgaaactagt ccgtttaggc 
tttaagaatt tatgacaata gattaacggt 
ctttgatggg ctatcaaatc tgaagctaaa 
cacacctgaa atcccagcac tttgggaggc 
tgagaccatc tgcctaacac ggtgaaaccc 
gggcgtggtg gcgggcgccg gtagtcttag 
ctgaacttgg gaggcgcagc ttgcagtgag 
gtgacagaag gagactctgt ctcaaaacaa 
aaaacaaaaa caagcaaaca aaaaacaaaa 
atgacactgg aataagcaaa tataaacata 
cttcaaagat atcccgcaga taagtttcta 
ataaaaagca caaatgaata ccttcaactt 
ttataaaata aagatattta atcattgaaa 
atatatgtgt gtatatatat gtatgtgtac 
aataattata tataatatgt agagttccga 
tgttttatag gcattgaagc attttacatt 
ttatttattt ttttagtatt tattgatcat 
gcagggtcat aggacaatag tggagggaag 
ctggttttcc taggcagagg accctgcggc 
gagattaggg agtggtgatg actcttaacg 
aagcacatct tgcaccgccc ttaatccatt 
gagagcacgg ggttgggggt aaggttatag 
tcttagtaca gaacaaaatg gagtctccta 
caatctgatt tctctttctt ttccccacat 
tcgtcatcat ggcccattct caatgagctg 
cgggcagagg ggctcctcac ttcccagacg 
ccgggcgggg tggctgctgg gcgggggctg 
ggccgggtag gggctgcccc ccacctccct 
tgccccccac ctccctcccg gacggggcag 
tggacagggc ggctgctggg tggagacacg 
cggaggggct cctcacttcc cagacgtggc 
cagatgtggc ggcggccatg cggaggagct 
agacggtcct cacctcccag acggggtggc 
acggggtcgc ggccaagcag aggcgctcct 
ggctcctcac atcccagacg atgggcggcc 
ggtggcggct gggcagaggc tgcaatctgg 
gaggtggagg ttgtagcaag ccgagatgac 
gcactgagtg agtgagactc cgtctgcaat 
atcactcgcg gtcaggagct ggagaccagc 
tacaaaaacc agtcaggtga gcggggcatg 
gcaggagaat caggcaggga ggttgcagtg 
cggctgggca tcagagggat accgtggaga 
gtggagggag agggagaggg agaccgtgga 
tttagagtgg ataggtaggt atataaattt 
aagtccctct catgagactg tcttggagag 
ttcctcctca gtaaattttc tacagacctt 
tgagtcctct gaactggcac catagttcac 
gaggtggcca tccatgttgc taaacagaat 
acattggcct tgctatgcaa tagcatcata 
ggattttgtg ctgtctacaa gtggtatttt 
ttgacccacc tcaaatttcg gtttgtcttg 
tgtgtgacag aagttcccaa tgctctgctt 
ctcttcagtt cttcaatcaa ctttttcctg 
gtaggcagct gctggcatcc agcccagagc 
gcacctctga ccagcaacct caatctggcc 
tccatgggct tgatcagcac cttcacaaat 
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tatgtttcca cagtccatat ctggtttatt 124380 
cagtaggatt ctaaattgaa tgttacctaa 124440 
gataatctgt tcaatgcatg tctaaaacca 124500 
taatgatctt gaagatgtgt gtttttgggt 124560 
ctagaactta agatataatc acatacaagg 124620 
aaggtgagaa attagaacag agatgccctc 124680 
ctctagagta ggacagagtc aacaacagca 124740 
tttgtttcat ggaggagtta aaaagtctta 124800 
catattaatt cagcgtttat atttacactg 124860 
aaaacatgaa ataggcccgg cgtggtggct 124920 
ccaggcggtt ggatcacgag gtcaggagat 124980 
cgtctctact aagagtacaa aaaatcagca 12504 0 
ctactcagga gactgaggca gaagaatggc 125100 
tggagattga gccactgcac tccagcctgg 125160 
acataaacaa aaacaaaaac aagcaaacaa 125220 
cataaagagg cccaggttag aagaaacctc 125280 
agtcagagaa tgaattatca actactcaga 125340 
attgacagac attttccagg aaaataagaa 125400 
attacagata ttaggattac ttgatgcaaa 125460 
taagtgtgaa aattaaaatt cgaagaccat 125520 
atatatatgc aaagactgat tatataaagc 125580 
gcatatctgc aggccgaagg agagactctg 12564 0 
tttttttatg ttctcagcac ctttatttat 125700 
tcttgggtgt ttctcgaaga gggggatttg 125760 
gtcagcagat aaacatgtga acaagggtct 125820 
cttccgcagt gtttgtgtcc ctgggtactt 125880 
agcatgctgc cttcaagcat ctgtttaaca 125940 
taaccctgag tggacacagc acatgtttca 126000 
attaacagca tcccaaggca gaagaatttt 126060 
tgtctacttc tttctacaca gacacagtaa 126120 
ttccccctta tctatttgac aaaactgcca 126180 
ttgggtacac ctcccagacg gggtggcggc 126240 
ggggggccgg gcagaggcgc cccccacctc 126300 
ccccccacct ccctcccgga tggggcggct 126360 
cccggacggg gcagctggcc gggcgggggc 126420 
ctggccgggc aggggctgac ccccacctcc 126480 
cctcacttcc cggatggggc ggctgccagg 126540 
ggctgccggg cggaggggct cctcacttct 126600 
cctgacttct caggcagggc agccgggcag 126660 
ggtcgggcag agacactcct cagttcccag 126720 
cacttcccag actgggtggc cgagcagagg 126780 
aggcagagac gctcctcact tcccagacag 12684 0 
gcactttggg aggccaaggc aggcagctgg 126900 
gccactgcac tccagcctgg gtaacattga 126960 
cccggcacct cggggggctg aggcgggcag 127020 
ccggccaaca cggcgaaacc caccaaaaaa 127080 
cctgcaatcc ccggcactcg gcaggctgag 127140 
agccgagatg gcggcagtac agtccagcct 127200 
gagagggaga gggagaggaa gagggagacc 127260 
aggagaggga gagggagagc attgaagcat 127320 
tcacattgaa aggaagagga agaaacagct 127380 
ggttacagca gagaggctgc ctttctctcc 127440 
ggcagcctaa tctctacagt cctgcagcag 127500 
tgggaaaggc attcatccat caagttcaga 127560 
aaaacaaaaa ttcactttgt catctgcctg 127620 
aactgcctga aaggtcttat tattttcaaa 127680 
tctccacctc agaaataaaa tattttagga 127740 
gttatcggcc actgcttata taagtctaat 127800 
catgaagtac cctttaggac tacctcttgt 127860 
aagttctcgt gtggggatct gatataaaca 127920 
cacctcttcc aggcccagga catccttaaa 127980 
aacacttttt tcagcatata caggtaaatc 128040 
cagaaaagaa accacaaccc agagtaaaga 128100 
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aaaaacaacc tttcaaagtt agtaaggcta 

aaacaaacaa actgatagca gaaaaaaatg 

atatttaaag aaccaagtaa cttatgactt 

tgtacattac aagatatctg tatctttata 

ttgcagaatg acagaaaaac tgatattgtt 

cctctcatat aatgctagtg ggaatgcaaa 

cagcataaat ttaagctgca catatttata 

ctaacaggca gaaatacatg aatatgttcg 

aacagccaca cctggaagta tcctatatgt 

tggtgtactt actatacagc cgtggagatg 

tcattttgaa taaaaatcac caacataata 

ctagacctaa cagaatatgc actgtataat 

aattaatgta cagtgttaaa agtgaagatg 

ggcaagcaga gagatagtgc agtactgtta 

atacaagtat gttcactttg taaaaattca 

tttctgtgtg taaattagac ttcaatataa 

agacttcaag caagtaaagc ttcattccaa 

acagagggta ttagtttgct agggctgcca 

acatgcagat ttatttcctc acagttctag 

aaaactggtt tcattctgag ttctttttct 

ctttacactt tctttttctg tgtgtatctg 

tcatattgga ttagggcaca gctcacccgc 

cgatatcgtg tctccaacag tcacgctctg 

aattttggag agtgagggaa gaggcacaat 

aatgctaata gaattttgac acaaagtttg 

aaagttgaat taagttgaaa gtaacatggt 

gtcaggaatg ttattgtatg gattttctaa 

taaggcagag aaggaccaga gagttgtgag 

gtcatctctc tagagaaatt attcagtgag 

ttttgcatat ctgtgaaatc ccttagggaa 

ttcctgtatt tgtcaccaga aaagccattg 

ttcataggta agtttctgca tgtgttttta 

atgtattttc agagttagat atatgattgt 

tgccaattac cagaaaaatg tagacagtta 

ttgattaatg aggatggaag ttaatggcat 

tcactcaata taaatgaaag ctaatattta 

ctacgcatat tacatactcc attttgtgaa 

atttcatcag taatagaaaa aaactatagc 

actcagatca gaagttctga catcacagtg 

gtttctacca gtacagtatg ctttatggtt 

gagggaaata tgctattatc catacctatt 

ggttgaatgt gttaaaactt ataaaataga 

aagtgctggt taaattagtt gtaaaactaa 

agagcagcat ttcagaaata tctaactcca 

lattgcatttt tatcaaagca ttcatctttt 

tcaaccagat gtgtgcattt gtcaaacaca 

cactgtatat aaattattgc aaaaacaaca 

aggagcttga tatataaaaa tgaatgaggg 

acacagcttt tactcaatgt tgtataaatc 

aaataacctt ctataagttg ttcttgtgaa 

ggtacatttc attaatatca gtgttaacag 

agctctggtt atgctaagca aaaccggaac 

aatgtgcggt taaaagacca ggctcagaga 

tgagcagatc aaaccacaca gaatcagtgt 

ttcactgtaa tcaattagat gaaaaaatat 

gacaagcagg gactaacatc tgacccctgc 

tttttttttt tttttttttt tttttttttg 

gagtgcagtg gcaccgtctt ggctcactgc 

cctgcctcag cctcctgagt agctgggact 

tactcctttc taaacagaga atctggtctc 

ggagagaatt tccatgagtc ttaactccac 

agagagtgtt actggctaga gaatttaata 

gacacttttc aaaataaggg gaaaaaaatc 
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ttggtggtgt taaaatatca cccggttaaa 128160 
gcaaaatatg aactggtgct atataaaaac 128220 
tgtgctcaac ctgaataata attagggaag 128280 
tatctgtacc tatatatctg tgtctgtatg 128340 
aagcattgga gagactgtgg aacaactgac 128400 
tggatataaa cagtatggaa aactactagg 128460 
ctccataaac caacactact gcccaaagta 128520 
tgaagagata tgcacaagat tgaattttgt 128580 
gtatcaatag taggaaggat ttattaattt 128640 
aaggaattat ttacatattc atgacactcc 128700 
tttagcacca atgtagtgtt cattggaagg 128760 
ttctattatg gaaaactcaa aagcagacac 128820 
tagttaattt gaggttagtg actaggaaag 128880 
atgttctact tgaggattta gaggcaagta 128940 
tgaagctgta cgcctacatt ttgtgcacac 129000 
aggttactaa aaacgaataa aaatagtact 129060 
tatcaaagca ttctatttac ccatcagtac 129120 
caaataagta ccatgaactt ggtgacttaa 129180 
aggctagaag tccaagatca aggtgtgggc 129240 
atcttgtgga tgatcatctt atcccgacct 129300 
tattctaatc tcttcttata aggatgcaag 129360 
tagaccttat cttacttaaa tgttctcttt 129420 
tggggcttgg agctggaact tcagcataag 129480 
ccagtccata acacagatta agaaatgtaa 12954 0 
tgacactggt aggagagaaa ctgtatcaga 129600 
aaacctaagg caatgtgaga atccatggca 129660 
tgtaagaagg aaaatgctga gactgaaaca 129720 
ttcccatttt aaatttgtgt tgtgccaaat 129780 
aaaaaaaatc taacagagta attgcttcat 12984 0 
ataaagtcat catacaaata ttataaatta 129900 
atattctttg taaggacagc tcttccctta 129960 
atcctggaac tctacttgct atacaatcgt 130020 
gatgattaaa tgactaggta gaaagaaaaa 130080 
gtatttaaga tacttttatt tgttaaagtt 130140 
aaaaatataa gaggcatgct ctaggatctt 130200 
ttaagggttt accacacatt gggcacagtg 130260 
atcctaaaaa tagcactttt gtgtttgtta 130320 
ccagagttat taagaaatat gacccagact 130380 
tgatctcaac cagttgactc caaagcacat 13044 0 
cgtagtggaa tttccttctg tactaaccat 130500 
acaggcagag tgtcatagaa tcggtttgag 130560 
ttagcgtttg gattataaga aacaccatgt 130620 
aacacagaat aaggaacatg tcaaaagaat 130680 
gatcctgtga actgatttta tgctaagcct 130740 
tgtttggttg agtcctaaaa cttaagaatg 130800 
caaagttggg cactttaaat atgtgaattt 130860 
ttaaacaaag aaacaaaggt gaaatctgac 130920 
tacatcacta ctatgattga atagatgtag 130980 
acaaataaaa tctttgtttc agatattcaa 131040 
tatagatgat ttttatatag aaaaatagaa 131100 
tttttgttgt tgttgtaagt aacagaacac 131160 
attctgaaag gatgttaggg ctcctaaatc 131220 
acagatagca gcctaggcag gtgtggaggc 1312 80 
ttctttgcaa catcagcctg atcttcaaac 131340 
tttgcaacct tgagtcatga acttacttaa 131400 
ctactccttt atggtttttt tttttttttt 131460 
agacagcatc tggctctgtc gtccaggctg 131520 
aagctccgcc tcccgggttc acgccattct 131580 
acaggcgcct gccaccacac ctgccctgac 131640 
tcagtggaaa ctagaagtgg aagatcaaga 131700 
ataatggaaa atttcctcca aaaaattcaa 131760 
gctgtgcaac cctcaaaagg caaatatgca 131820 
atgccctgcc tggagaatcc ctattgcact 131880 
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gagcctggag gaataagaaa gcaacaaatg 
gaagggatcg gattggtgaa aactattcac 
ttttgatact tgtccttttc tctctttacg 
aaaaagaaat tgcttagttc tgaatgttat 
tttctcaaat aaaaggagag gcagaataag 
aaacagaagc tgaatttgaa gggagtgaaa 
taaagggcag tatcataaaa ggtagagagc 
gaaacaaagt tcagtagaaa tagaatgagg 
taaaagatca ttactgtaat agtaataatc 
atgtaaagtt cctttcttcc accgaaggaa 
gagttcattc tcctgggact gaccagtcgc 
tttctggcca tttacatggt cacggtggca 
gccaacgcct ggctccacat gcccatgtac 
ctgtgcttct cttccaatgt gactccaaag 
agcatttcct atcctgcctg tcttgtgcag 
gagatctaca tcctggctgt gatggccttt 
ctttatggca gcagaatgtc caagagtgtg 
tatggagcgc tcactggcct gatggagacc 
cccaatgaaa ttaatcactt ctactgtgcg 
gacacctaca acaaggagtt gtcaatgttt 
ctcttcatca tatgtatttc ctacctttac 
acagagggca ggcaaaaagc tttttctacc 
ttctatgcaa cccttttctt catgtatctc 
ggtaaaatgg tagctgtatt ttataccaca 
agccttagaa ataaaaatgt aaaagaagca 
ttttcttaaa aatcagtatt cttttggttt 
ctgagaaata tagtgcatca atggagaaca 
ttttattatt atattttgag atggagtttc 
gatctcggct cactgcaacc tttgcctccc 
ccgagtagct gggaacacaa gcgcacacca 
agagacaggg tttcaccatg ctggccaggc 
atgaaacacc gcgcccagtc taaaaacttt 
gttcacaaaa gcttttatgt tttaagttgt 
tctttgtctt ccgtgcacag aatggctttg 
catgtggcta tttctggaag atgagattaa 
cttaattgat tgttagarca tttttgagtg 
ccagcaagac ttaatatagt ggctgcccag 
actcacaatg gcaatacaaa acggctgata 
tggagtttgc ttgtttccct agcacaaaat 
tacaaattca cacaggctga attatataat 
aatcacattt gagattttgt tttgtgttct 
atgtgtgttc agttaaaatt atttattata 
tatgtagttt taaatgtcaa ataatattac 
accttaccac taactcccca tagataatac 
attttaaatt atatgcttgt attcctctgt 
tgacatctgt ttattaaaga taaggtgcca 
cctacaatga tcacttcttt ctcttcattg 
ttaatcagta tctactgtta aaattactat 
ctataatact ttgagcatca ctaacactag 
tgattgtaat tctcttcttg cacagttact 
gttttgctct tctttttgta tagtagctac 
cacctttata gggtcaaaca caatagatga 
ttctctgtcc tcccattgga ttctgctctg 
tccaatcatt ggtttaaggc atttcattca 
aactggagat tcaacaaaaa taagacagaa 
ttaatgaact tttctgctgt tatcatgagt 
catgtttgta tttttctgca atgtcagaat 
atgagctaca tggggttccc gaggaggcaa 
gtcagagcca tgggcaaaca ggctcaagtc 
tgaattacag tatatttaat cagtctataa 
aaagtaacat ttatcatcaa gaataaaggg 
gagagtgaca tgaacaaaaa gaatagccta 
gtaaagaaga gttttcgttg tgggcaaagc 
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agcccaacct gtttgctcag gcttctccag 131940 
aaagaacaag atcctcgggc aaatattgct 132000 
cgcattctct gcccagtatt ataatcggga 132060 
gttttggggc aagtccacct ataaagtaca 132120 
gggaggagaa acgagagaag aagaatagaa 132180 
agaaagaaaa tggaaagaaa acaagaaaaa 132240 
tgctttgaat atccctaatt tagtttacct 132300 
aaaactgcca aaaagagggg acgtttttgg 132360 
atgttgctat tgtattgtaa tccaatatat 132420 
attatgagaa gaaactgcac gttggtgact 1324 80 
cgggaattac aaattctcct cttcacgctg 132540 
gggaaccttg gcatgattgt cctcatccag 132600 
tttttcctga gccacttatc cttcgtggat 132660 
atgctggaga ttttcctttc agagaagaaa 132720 
tgttaccttt ttatcgcctt ggtccatgtt 132780 
gaccggtaca tggccatctg caaccctctg 132840 
tgctccttcc tcatcacggt gccttatgtg 132900 
atgtggacct acaacctagc cttctgtggc 132960 
gacccaccac tgattaagct ggcttgttct 133020 
attgtggctg gctggaacct ttctttttct 133080 
attttccctg ctattttaaa gattcgctct 133140 
tgtggctccc atctgacagc tgtcactata 133200 
agacccccct caaaggaatc tgttgaacag 133260 
gtaatcccta tgctgaacct tataatttat 133320 
ttaatcaaag agctgtcaat gaagatatac 1333 80 
ctaaagccct tcctagactt ttttctttag 133440 
ttgcagtttt caaaacttta tttattttta 133500 
tctctgtctt aggctggagt gcagtggtgt 133560 % 
gggttcaagc aattctcctg cctcagcctc 133620 
ccatgcccga ctcatttttt gtattttagt 133680 
tcggcctccc aaagtgctgg gattacacgc 133740 
attttctaaa attcaaatac gtacaatttt 133800 
cattcatctt gttcagcagt tattttaagt 133860 
tacctccatg cccttaggtt taggtaagat 133920 
aattcacata tgtcacttct cagttgaata 133980 
gatctgttgt ttttattttt tccatggtga 134040 
tcaggagaca tggaacactg ttctcatccg 134100 
aataacgttt tcataagcta ctatgatatg 134160 
agcctattct gattgatgcc atattaatta 134220 
atagaatgag aacatcacaa accttgaaga 134280 
tgtagtcatt tttaatgctt ttacatacac 134340 
attaccattt gaacctactc atgattaatg 1344 00 
aagccatata taaactcctg tctcacttct 1344 60 
tttcttgtta tttcctttgg aaattaaccc 134520 
actgattttt caaatatgca cataatttat 134580 
tcattctttc accaatcact tcattcttac 134640 
tcacagtagg atcacatttg aatttttggc 134700 
gaatgtatca ttaataatac ctgaagtata 134760 
tattaattat taataatatg aatactacta 134820 
ggattttcat aaagttgaca atggcctcca 134880 
ttttccccaa actgtcacta aactgtaaag 134940 
tctatagttt ccatattttt ttctcagacg 135000 
tttgcagagc tgttatttga agataatgat 135060 
ttcaacttga ttacatgaca atatgcaaca 135120 
aaaagtctta aagagagttt tatggaacag 135180 
ggcacaaggt cagagataac taggtccatg 135240 
tttcttgatg ctatttcaat cataaaatcc 135300 
ttctccttaa aacttcctgt tcactcagta 135360 
attcaacaag tcagtcagta ttgcagtcca 135420 
atgtaataga ttacacattg tacagcaaac 135480 
actggaaatg ggttaaggaa ccagtccact 135540 
ctcaggccct ggtggtccat caataacctt 135600 
cttcagtggc aaatgcaaga tgcttatctc 135660 
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aagtgataac aagatggtgt cttttaaggt ggctgtttca agctgctgaa atcctgctct 135720 
tttatggaca cagagtcctc tagtaagaac tgatagtgga agagtgactt tgttatgtcc 135780 
ttatctggtt ggatgcagtc tttcttgatt aggcaaaaca tctggccctt gttggcatga 135840 
tgcctttaaa aatgtaagat gaagtcattt tctaagatgg agtaacttat atcaatgggg 135900 
ctctatacta cgattcagcc ccaggttccc ttctacactt tcctttaccc ttaccatttc 135960 
agccttagct gaagagctgg gtgcagtggc tcatgtttgt aatcccagca ctttgggagg 136020 
ctgaggtggg tgaatcacca gaggtcagga gttcgccact agcctggcca acatggtaaa 136080 
accctgtttc tactaaaaat acaaaaatta gctgagcatg ctggtggaca cctgtaatcc 136140 
cagctatttg ggaggctgag gcaggagcat tgcttggacc caggagacag aggttgcagt 136200 
gagccgagac cgcactgttg ccctccagcc tgggcaacaa gagtgaaatt ctatctcaag 136260 
aaaatagact tagctgaaga attatctctt cctgcaaggt ttctttaaca cattcaggat 136320 
gtgtagtgtc aactgaggaa tgatgagatt cataaattta gaaaggtgga ctttctcata 136380 
aagggttgta gcctgtaggg tgaccgttct gacaggctgt gaagcatatc ctccagctag 136440 
aagttggaaa gagacacttc gacagtatga agagtaagac agggatttat gctgaatggg 136500 
atgaccaaat atattacaca tatatgataa tacatattca acaggctata gaaaaaacta 136560 
tgaatattca caaagaagag gcacaggcat gaatagtagg ctaatataag caacatgcat 136620 
cccatgttcc ctttggagtg gggacttaac atttaaatgt gtcatgatta gactctatgc 136680 
accaaaaggt gaatcagaag acaccaagac cctctgtgca cagcctctgt tgactggcaa 136740 
gagccactag gttattggtg gtctcttatc aagaaggaat gctggtcaat tgctgtgttg 136800 
aaaccgcaaa aagagaagtc cagtgtcagg tggtttgcag atatcagtgg tggtgcgagt 136860 
ctcacaaggg caggtttctg tttaaccctt attgtaggaa gcctaatggt gtttagcaag 136920 
ggaggggaga taacgaggca tgtctgatct cccatctgtc atggcaggaa ctcagatttt 136980 
aaagtttttc tgaggtttcc ttgaccaaga ggcagtctgt tcaattgatt caggggttag 137040 
gattttattt gtatttctca ttaggtacca catgatgatt cacccacaga ttcacaatta 137100 
tattattgca ttcataacat ggttctataa ttatatgcat gtaaatctgt tccttccact 137160 
attttagcaa gttctcaaag gaaaggacca catctttttg tttttatatt tttaccacct 13722 0 
taagatagtg ttctataaag ggaggatgcc catttttttt ttgaaactgt gagaacaatc 137280 
ccttccactt tctacctttg tctgtaatta tggccaatta cagatttctc tccatgatac 137340 
tcgcttctcc catcctaaca tatattcaag gcagaacaat agatcattta gtttaagaaa 137400 
accatgttca agttctttat cataaatggc ccactgaaag cccagcaacg tgaatcataa 137460 
ctgagcaaga attggagaaa gtaatttcat tggcagcaga caggaaagat cacatactac 137520 
atcctattct tcatagcaga gagacagata acaataaatg ctgaactaca gtaaaagatg 137580 
ttaagggaaa tatttgtgag gaataaatct ttgtagcaat gtattttcct tgatatgcaa 137640 
tcataattat ccatagcatt tgggaaacaa ctgacaattt ttatcacctt tataaatgtt 137700 
agttttgatc tttatagcaa ggttataatg gaaaaatcaa ccactgtgta ataaaattat 137760 
tttaaaatga acagaattac actaggctgt ctgggacaga ggcaagggaa gggctgagtc 13782 0 
atgatttaag gtgcaggaaa caaaagggac ctcattgttg ctcagacaga aaagaggttt 137880 
ggtgggaatc agacaacagg tatatattga gacaacgaaa tatccaatcc ttgaaaaagt 137940 
acattcttgt gcatcacttt ttcatagcca accgtcctaa gatttatgcc atgtgataag 138000 
ctgatgataa aacattctct tcaagttgaa acagaataca gttgtggaaa aatatttggc 138060 
tttgtaatta ttcagaactg agcctgaatc ctagttttat tactattttg ttagctgggt 0.38120 
aaacttagag aagttccctc tccatatcag tttattcaaa tgcaaaaccc catttcatgg 138180 
agttattgtg aatatcaaat attattttta atatacattt tcctcaattg cattttgagg 13824 0 
caagtatgct gagttccagt gtggactcaa tttcatgaaa gttttttaac atgggaaaca 138300 
tgatcaaaac aataagttaa atatgttagt tattcattta tttaacatat gattattgtc 138360 
cctgtcagtc actttcatca ttggcagtca ccagtctctt ctacctgttt cattgtttct 138420 
tcatcagtct cttcctgttg tattgtttca tctatctgtt tctaaaacat ttcatgtttt 138480 
ttccagaaat tatttgtacc tagtattatt tgtttacata tcttaggcat cccatcaaaa 138540 
tgcaaatccc tttgttgaag aaatccttta aaaatatttt tattattcag attaaatagt 138600 
attgaagttt gtacttagta catattcttg tgtgaacttg ataaaggaca aacaatggag 138660 
gaaatatggt agtgctcttg agatgggaag gacaattacc tgaattcagg tttcatgaga 138720 
agttaagtgt cctcagacta ttttaaattt gtttttcagt attgtataaa gtgctaccct 138780 
cgttggataa caggtgattg ctttgagctc tactgaagta taaaatatag atttttttct 138840 
aacatcttga ttcaagataa aatagtacaa ccaattagta tttcccttga gaatgttctc 138900 
cactgtgatt tcacttctaa ttacctcttc ttacacattt ggggattctc ctgagatata 138960 
tgctgatgcc agtgatgtgg gtatatatct tccagcactg gccagaattt gacctttggt 139020 
gtcatgaaaa agacaccctt attttcacag gtaagaaagg aaatgttttt gcatattatt 139080 
tttcagtatc taataatgtc tgtattcact ttgttaaatt gtgtacattt tagacagcaa 139140 
tgtgtttaaa taggtagttt tgaggcaaat ttgcacagac aatgcatttt ccatgaaaga 139200 
aaatctgtag ggagcttact tagcttcatc tttcactttt ttgatattac cagtgttatt 139260 
gaaacatcct ggtggaattt catagtatgg ctatttcagc agatgttttc actgttatta 139320 
gtttcaagta atattatttt tgtgttttcc atttgcctaa tgtcttggtg cctactgtca 139380 
ccaatcgcca caagaaaaag gcaagttgaa aataatttaa gtctacatag cttaattaat 139440 
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cacataaaat ttttctgacg tctaagtaaa 
ccattaaaca tgtgtgaaag tggtccatta 
tgagtagatc ttaagttctc ttgggttctc 
atgccctatt tttgtctaga gacctagttt 
actttagttt tcagaatatt attttacttt 
cctgattata atcaaatcag tctgattata 
taacttgtgt caaggacctt aaacagtaag 
ctaggctgca caagtagtca cagaatcatt 
aagttattga ctgcatttga atgtttgttt 
acagatttct catgggctct atatctgtat 
aatgctagaa ttaaaatatg gttaaygata 
gaccagtgaa acctgagcaa atttgcaaag 
aagaccctgg attgaagaag ttgctcacca 
ccttcttgcc aattaagtta caattcatag 
tcatattctt tacagtttcg gcactgatac 
catctcttat ctttgttaga gtgcttgtca 
ctcaaagtcc tcatttattc atatttgttt 
agataaattg aattttaaca ttttggaaac 
aaccaaagga agagatagac ctttctggga 
tctgagaatt ataatatatg atttgtcttc 
gaccagaatg tcaaattttc tgagcctggt 
attgtccact gctctatttc tagcatccag 
taattatctg ctataagcac gcatgcatac 
gtaaattagt gtttttttaa ctataaaatt 
aagatctcat acaatttaat agaaaatatg 
aaagtgctac tcgtattgac ttcctgagac 
tgacaagttc tccactaatg acatttgctc 
gatgtaaatt tcattttacg caattgttag 
gagacagatt aacacatatg atattttaac 
tttgtgataa attcagtaaa tgcataatgt 
gaatacacat taaaccactc cctagaatta 
atgcacaatt taataatttt cccatttaaa 
tactaaaata gtccctcttt ccaaccctga 
tcacctatat acatgtagct gttcctggac 
tatttttttc caatttcatg tcaataacac 
cttgatatct aataagtaac tctgcttatt 
tttatacttc cgtataaata ttagaatcag 
attggaattt atttgactct atgggtcact 
atccttccct tagtgaacaa agaatctctc 
ttctttcctt tcctttcctt tcctttcttt 
ctttttcttt ctttctttct ttcttccttc 
ttctttgttt ctgagacgaa ctctcgcctc 
atgatctcag tgcactacaa cctctgcctc 
tcctgagtag ctggggttac aggtgcacac 
agtagagaca gggtttcacc atgttggcca 
tcaacccgtc ttggcctccc aaaatgctgg 
catttttaag gctttttaat gttgtgtggc 
ctgggtatat tagcagggag taagtataga 
cttagggtga ttgagttaat tccttttatc 
ataatggagg aataggaaat tctttgatat 
catgtgagcc aagcagggga ctataatgtt 
gcaagaaatt aaaatatagg aacaaattaa 
tatgaaattt catatagaat tgagatattg 
caacagacag acaggctaaa ttgggatgtg 
atatcaatgt ttaagactca aacgaaaggc 
tataaacaaa gaaccaatat acatatcaca 
atataaaaat acaccgagac agtaagttga 
tcaccgaatg gggatgtatg agataagcat 
ggattcagtc tctcagtctt cagaatgaag 
tctacctctc catgtagctt tgtatgtata 
ttcaaaatct ctattacaat ttaatcttta 
atttcccaca gcataaggat actttcgttt 
gcacactgtg attacaaatt cagcctcagt 
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ataaatgctc aaacaggctt tttattgtga 139500 
gcttcagtat gtggagctaa cgaacttggg 139560 
ttattttata agattctggg ttcttttcca 139620 
ttttacaagc aatacttgtg cctcagtgag 139680 
tttcaacaaa atacttgaaa cataaatcag 139740 
atcaaatcag tctgattcta atcaaataac 139800 
ttgatgaatg aaattctcat acctattttt 139860 
ctcgggtcat tatcaaattc atcatcatca 139920 
tgagtgtttc ttgcctctca gcagcagtgg 139980 
attagctgtc tctgattgcc attgtcccgc 140040 
atggtctcca ctaaaagtgc atgtccagat 140100 
ctccttagag actaagtcag gactaaaacc 140160 
ccagagagtt aaaagaagcc cttgtgcctt 140220 
gcattattct agattcttgg cccagatttt 140280 
ctcggctggc tcgctttctt cttcctcatc 140340 
ctatcatctg taactcaaag acaactctaa 140400 
cctgggacag gataaggagg caataatatt 140460 
aatttctgtc attacccaac cagtcaatgg 140520 
aagtaaaata ttacagaagg agagcaccgt 140580 
tgattcttct gcttatgagt tgtaagctgg 140640 
tactcaccta actaaattta tttctactat 140700 
caccatgtct ggaacttgaa aaatgctcaa 140760 
acaaatgcat ggctgacaac atattacaaa 140820 
tcactttaat aagagtttag caattgttgt 140880 
tgtctgatat tcttagatta ggtgaaaaat 140940 
tgctcagaag agataagttt tatcagcttc 141000 
atgtctgaat tgccccatgg aaaaaaatta 141060 
tatactgaaa ttgtttttaa ccaaaatttg 141120 
ttaatttgat attttatata tttaarataa 141180 
ctatataaat tgtaagctac aaaatataat 141240 
taagtaattt ctgtgcattg tttaaagtaa 141300 
taatctataa tccattaagc caatagtaaa 141360 
tctgaaatgt catctttgtt atatgccaca 141420 
tttctattct ggaacattag ttaaagtgta 1414 80 
actgtcctga atactggttt tacagcaaat 141540 
gccctgagag atattgagtc ttacggttat 141600 
ctattccaca agaattgtgt ttgaattttg 141660 
taggaaagac tgaccttttt atgctattga 141720 
tccattttta gttctttctt tctctctctt 141780 
cctttccttt cctttctttc tttcttcttt 141840 
cttccttcct tccttccttc cttccttcct 141900 
tctttgccac ccaggctgga gtgcagtggc 141960 
ctgtgttcaa gtgattctcc tgcctcagtc 142020 
ctccatacct ggctgatttg ttttattttt 142080 
ggctggtctc gaactcatga ccagaagtga 142140 
gattataggc ataagccacc acatccagcc 142200 
aatggcatca aagtgatcac ataatttatt 142260 
ttgagaacct gaaaacattt ttttaatcat 142320 
ctaatggcta aactacacat cagggttaac 142380 
aggacttctt tatacagaga tataaaagag 142440 
ggttctatgc agcatgaatg ttgtaaaata 142500 
aacatagatt gatacctcaa agataataca 142560 
atgaaggagc ataaatacac ttgtttaacc 142620 
caaattggaa aaaatgctta taaacacgtg 142680 
tttatttaca taagttccca aaaactaact 142740 
atgaaaagca ctactactaa aaacaacaaa 142800 
actcggtgag atatatagta tattagcaga 142860 
ctcagtggga ttagtcactt catctctcta 142920 
agctttgcct gcattatcct caaagtcctt 142980 
tgtaagcatg tcattagaag ttctaagcat 14304 0 
acatataagt tatctagatt tttaaaagaa 143100 
ataattttgc attaagcctc caacttaatt 143160 
ggaaaccact tgttgcaaat gtttagactt 143220 



WO 00/21985 



PCT/IB99/01729 



46 

gcttcatagt ataactgttc atcttcagtt acagaactgc tactgagata acataactaa 143280 
agccttttgg ctctttttat acaaagcatg atatttaact agggttttag tgatttttaa 143340 
aaagtttctc tttctcctta gatattcaga ccaatgcgtc tcatatgaga tgaagaaatg 143400 
tccagaagaa actatactga actgacagaa tttgttctct tgggtctaac aagccgtcca 143460 
gagctgcgag ttgctttctt ggcactgttc ctttttgtct acatagccac tgtggtagga 143520 
aacttgggga tgattatttt aatcaaagtt gattctcgac ttcacactcc catgtaattt 143580 
tttctctcca gtttgtccat tctagatctg tgtttctcca caaatttcac tcccaaaatg 143640 
ctagaaaatt tcttatcaga gaagaagacc atttcctatg caggttgttt gatgcagtgc 143700 
tatgttgtca ttgctgtggt ccttgcagag cactgcatgt tggcagtcat ggcatatgac 143760 
cgctatatgg ccatctgtaa tccattgctc tacagtagca aaatgtccca aggtgtttgt 143820 
gtccacctgg tcattgtccc ttatgtctat ggctttcttc tcagtgtgat ggaaacctta 143880 
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70 








75 










80 


Leu Glu 


He 


Phe 


Leu 


Ser 


Glu Lys 


Lys 


Ser 


He 


Ser 


Tyr 


Pro 


Ala 


Cys 








85 








90 










95 




Leu Val 


Gin 


Cys 


Tyr Leu 


Tyr He 


He 


Leu 


Val 


His 


Val 


Glu 


He 


Tvr 






100 








105 










110 






He Leu 


Ala 


Val 


Met 


Ala 


Phe Asp 




















115 








120 


















<210> 20 


























<211> 311 


























<212> PRT 


























<213> Homo sapiens 






















<400> 20 


























Met Arg 


Arg 


Asn 


Cys 


Thr 


Leu Val 


Thr 


Glu 


Phe 


He 


Leu 


Leu 


Glv 


Leu 


1 






5 








10 










15 




Thr Ser 


Arg 


Arg 


Glu 


Leu 


Gin He 


Leu 


Leu 


Phe 


Thr 


Leu 


Phe 


Leu 


Ala 






20 








25 










30 






He Tyr 


Met 


Val 


Thr 


Val 


Ala Gly Asn 


Leu 


Gly 


Met 


He 


Val 


Leu 


He 




35 








40 










45 








Gin Ala 


Asn 


Ala 


Trp Leu 


His Met 


Pro 


Met 


Tyr 


Phe 


Phe 


Leu 


Ser 


His 


50 










55 








60 










Leu Ser 


Phe 


Val 


Asp Leu 


Cys Phe 


Ser 


Ser 


Asn 


Val 


Thr 


Pro 


Lvs 


Met 


65 








70 








75 










80 


Leu Glu 


He 


Phe 


Leu 


Ser 


Glu Lys 


Lys 


Ser 


He 


Ser 


Tyr 


Pro 


Ala 


Cys 








85 








90 










95 




Leu Val 


Gin 


Cys 


Tyr 


Leu 


Phe He 


Ala 


Leu 


Val 


His 


Val 


Glu 


lie 


Tyr 






100 








105 










110 






He Leu 


Ala 


Val 


Met 


Ala 


Phe Asp Arg 


Tyr 


Met 


Ala 


He 


Cys 


Asn 


Pro 




115 








120 










125 








Leu Leu 


Tyr 


Gly 


Ser Arg 


Met Ser Lys 


Ser 


Val 


Cys 


Ser 


Phe 


Leu 


He 


130 










135 








140 










Thr Val 


Pro 


Tyr 


Val 


Tyr 


Gly Ala Leu 


Thr 


Glv 

X 


Leu 


Met 


Glu 


Thr 


Met 


145 








150 








155 










160 


Trp Thr 


Tyr 


Asn 


Leu 


Ala 


Phe Cys Gly 


Pro 


Asn 


Glu 


He 


Asn 


His 


Phe 








165 








170 










175 




Tyr Cys 


Ala 


Asp 


Pro 


Pro 


Leu He Lys 


Leu 


Ala 


Cys 


Ser 


Asp 


Thr 


Tyr 






180 








185 










190 






Asn Lys 


Glu 


Leu 


Ser 


Met 


Phe He 


Val 


Ala 


Gly 


Trp 


Asn 


Leu 


Ser 


Phe 
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195 200 205 

Ser Leu Phe lie lie Cys lie Ser Tyr Leu Tyr lie Phe Pro Ala He 

210 215 220 

Leu Lys He Arg Ser Thr Glu Gly Arg Gin Lys Ala Phe Ser Thr Cys 
225 230 235 240 

Gly Ser His Leu Thr Ala Val Thr He Phe Tyr Ala Thr Leu Phe Phe 

245 250 255 

Met Tyr Leu Arg Pro Pro Ser Lys Glu Ser Val Glu Gin Gly Lys Met 

260 265 270 

Val Ala Val Phe Tyr Thr Thr Val He Pro Met Leu Asn Leu He He 

275 280 285 

Tyr Ser Leu Arg Asn Lys Asn Val Lys Glu Ala Leu He Lys Glu Leu 

290 295 300 

Ser Met Lys He Tyr Phe Ser 
305 310 

<210> 21 

<211> 59 

<212> PRT 

<213> Homo sapiens 

<400> 21 

Met Ser Arg Arg Asn Tyr Thr Glu Leu Thr Glu Phe Val Leu Leu Gly 

1 5 10 15 

Leu Thr Ser Arg Pro Glu Leu Arg Val Ala Phe Leu Ala Leu Phe Leu 

20 25 30 

Phe Val Tyr He Ala Thr Val Val Gly Asn Leu Gly Met He He Leu 

35 40 . 45 

He Lys Val Asp Ser Arg Leu His Thr Pro Met 
50 55 

<210> 22 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<400> 22 

cctggagggt ttcaaaggct gatactttag 30 

<210> 23 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<400> 23 

ctccagcctg agcaacagag caatac 26 

<210> 24 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<400> 24 

ctcacattca ttgttcttca cagacccagc 30 

<210> 25 
<211> 24 
<212> DNA 

<213> Artificial Sequence 



<400> 25 

ccctgctggg atctggatca agac 



24 
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60 

<210> 26 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerPU 
<400> 26 

tgtaaaacga cggccagt 

<210> 27 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerRP 
<400> 27 

caggaaacag ctatgacc 



